Spillable

Spillable is an extension of the MemoryConsumer abstraction for collections that can spill to disk.

Spillable[C] is a parameterized type of C combiner (partial) values.

Creating Instance

Spillable takes a single TaskMemoryManager to be created.

Spillable is an abstract class and cannot be created directly. It is created indirectly for the concrete Spillables.

Extensions

Table 1. Spillables
Spillable Description

ExternalAppendOnlyMap

ExternalSorter

Configuration Properties

spark.shuffle.spill.numElementsForceSpillThreshold

Spillable uses spark.shuffle.spill.numElementsForceSpillThreshold configuration property to force spilling in-memory objects to disk when requested to maybeSpill.

spark.shuffle.spill.initialMemoryThreshold

Spillable uses spark.shuffle.spill.initialMemoryThreshold configuration property as the initial threshold for the size of a collection (and the minimum memory required to operate properly).

Spillable uses it when requested to spill and releaseMemory

Memory Threshold

Spillable uses a threshold for the memory size (in bytes) to know when to spill to disk.

When the size of the in-memory collection is above the threshold, Spillable will try to acquire more memory. Unless given all requested memory, Spillable spills to disk.

The memory threshold starts as spark.shuffle.spill.initialMemoryThreshold configuration property and is increased every time Spillable is requested to spill to disk if needed, but managed to acquire required memory. The threshold goes back to the initial value when requested to release all memory.

Used when Spillable is requested to spill and releaseMemory.

Releasing All Memory

releaseMemory(): Unit

releaseMemory…​FIXME

releaseMemory is used when:

  • ExternalAppendOnlyMap is requested to freeCurrentMap

  • ExternalSorter is requested to stop

  • Spillable is requested to maybeSpill and spill (and spilled to disk in either case)

Spilling In-Memory Collection to Disk (to Release Memory)

spill(
  collection: C): Unit

spill spills the given in-memory collection to disk to release memory

spill is used when:

forceSpill Method

forceSpill(): Boolean

forceSpill forcefully spills the Spillable to disk to release memory

forceSpill is used when Spillable is requested to spill an in-memory collection to disk.

Spilling to Disk if Necessary

maybeSpill(
  collection: C,
  currentMemory: Long): Boolean

maybeSpill…​FIXME

maybeSpill is used when: