Skip to content

Spillable

Spillable is an extension of the MemoryConsumer abstraction for spillable collections that can spill to disk.

Spillable[C] is a parameterized type of C combiner (partial) values.

Contract

forceSpill

forceSpill(): Boolean

Force spilling the current in-memory collection to disk to release memory.

Used when Spillable is requested to spill

spill

spill(
  collection: C): Unit

Spills the current in-memory collection to disk, and releases the memory.

Used when:

Implementations

Memory Threshold

Spillable uses a threshold for the memory size (in bytes) to know when to spill to disk.

When the size of the in-memory collection is above the threshold, Spillable will try to acquire more memory. Unless given all requested memory, Spillable spills to disk.

The memory threshold starts as spark.shuffle.spill.initialMemoryThreshold configuration property and is increased every time Spillable is requested to spill to disk if needed, but managed to acquire required memory. The threshold goes back to the initial value when requested to release all memory.

Used when Spillable is requested to spill and releaseMemory.

Creating Instance

Spillable takes the following to be created:

Abstract Class

Spillable is an abstract class and cannot be created directly. It is created indirectly for the concrete Spillables.

Configuration Properties

spark.shuffle.spill.numElementsForceSpillThreshold

Spillable uses spark.shuffle.spill.numElementsForceSpillThreshold configuration property to force spilling in-memory objects to disk when requested to maybeSpill.

spark.shuffle.spill.initialMemoryThreshold

Spillable uses spark.shuffle.spill.initialMemoryThreshold configuration property as the initial threshold for the size of a collection (and the minimum memory required to operate properly).

Spillable uses it when requested to spill and releaseMemory.

Releasing All Memory

releaseMemory(): Unit

releaseMemory...FIXME

releaseMemory is used when:

  • ExternalAppendOnlyMap is requested to freeCurrentMap
  • ExternalSorter is requested to stop
  • Spillable is requested to maybeSpill and spill (and spilled to disk in either case)

Spilling In-Memory Collection to Disk (to Release Memory)

spill(
  collection: C): Unit

spill spills the given in-memory collection to disk to release memory.

spill is used when:

forceSpill

forceSpill(): Boolean

forceSpill forcefully spills the Spillable to disk to release memory.

forceSpill is used when Spillable is requested to spill an in-memory collection to disk.

Spilling to Disk if Necessary

maybeSpill(
  collection: C,
  currentMemory: Long): Boolean

maybeSpill...FIXME

maybeSpill is used when: