Demo: Spilling¶
This demo shows in-memory data spilling while sorting (using SortExec physical operator).
Configuration¶
Disable Adaptive Query Execution and force spilling at a very low threshold using spark.shuffle.spill.numElementsForceSpillThreshold
(Spark Core).
./bin/spark-shell \
--conf spark.shuffle.spill.numElementsForceSpillThreshold=1 \
--conf spark.sql.adaptive.enabled=false \
--conf spark.sql.shuffle.partitions=1
Create Table¶
spark.range(2)
.writeTo("tiny")
.using("parquet")
.create
Spilling¶
One of the physical operators that are susceptible to spilling is SortExec.
spark.table("tiny")
.orderBy("id")
.write
.format("noop")
.mode("overwrite")
.save
FIXME
Why does show
not work (as format("noop")
does)?