Skip to content

Demo: Spilling

This demo shows in-memory data spilling while sorting (using SortExec physical operator).

Spilling in Sort Physical Operator

Configuration

Disable Adaptive Query Execution and force spilling at a very low threshold using spark.shuffle.spill.numElementsForceSpillThreshold (Spark Core).

./bin/spark-shell \
  --conf spark.shuffle.spill.numElementsForceSpillThreshold=1 \
  --conf spark.sql.adaptive.enabled=false \
  --conf spark.sql.shuffle.partitions=1

Create Table

spark.range(2)
  .writeTo("tiny")
  .using("parquet")
  .create

Spilling

One of the physical operators that are susceptible to spilling is SortExec.

spark.table("tiny")
  .orderBy("id")
  .write
  .format("noop")
  .mode("overwrite")
  .save
FIXME

Why does show not work (as format("noop") does)?

web UI

Details for Stage

Details for Stage

Tasks

Details for Stage: Tasks