ShuffleExternalSorter — Cache-Efficient Sorter

ShuffleExternalSorter is a specialized cache-efficient sorter that sorts arrays of compressed record pointers and partition ids. By using only 8 bytes of space per record in the sorting array, ShuffleExternalSorter can fit more of the array into cache.

ShuffleExternalSorter is created exclusively when UnsafeShuffleWriter is created (and requested to open).

ShuffleExternalSorter is a concrete MemoryConsumer that can spill to disk to free up execution memory.

ShuffleExternalSorter uses the page size to be the minimum of PackedRecordPointer.MAXIMUM_PAGE_SIZE_BYTES and pageSizeBytes, and Tungsten memory mode).

Enable ALL logging levels for org.apache.spark.shuffle.sort.ShuffleExternalSorter logger to see what happens in ShuffleExternalSorter.

Add the following line to conf/

Refer to Logging.

getMemoryUsage Internal Method

long getMemoryUsage()


getMemoryUsage is used when…​FIXME

closeAndGetSpills Method


insertRecord Method


freeMemory Method


getPeakMemoryUsedBytes Method


writeSortedFile Internal Method

void writeSortedFile(boolean isLastFile)


writeSortedFile is used when…​FIXME

cleanupResources Method


Creating ShuffleExternalSorter Instance

ShuffleExternalSorter takes the following to be created:

ShuffleExternalSorter uses spark.shuffle.file.buffer (for fileBufferSizeBytes) and spark.shuffle.spill.numElementsForceSpillThreshold (for numElementsForSpillThreshold) Spark properties.

ShuffleExternalSorter creates a ShuffleInMemorySorter (with spark.shuffle.sort.useRadixSort Spark property enabled by default).

ShuffleExternalSorter initializes the internal registries and counters.

Spilling To Disk (Freeing Up Execution Memory) — spill Method

long spill(
  long size,
  MemoryConsumer trigger)
spill is part of the MemoryConsumer Contract to sort and spill the current records due to memory pressure.

spill prints out the following INFO message to the logs:

Thread [threadId] spilling sort data of [memoryUsage] to disk ([spillsSize] [time|times] so far)

spill writeSortedFile (with the isLastFile flag disabled).

spill frees execution memory (and records the memory bytes spilled as spillSize).

spill then requests the ShuffleInMemorySorter to reset followed by requesting the TaskMetrics (of the TaskContext) to increase the memory bytes spilled.

In the end, spill returns the memory bytes spilled (spill size).

spill returns 0 when one of the following holds:

Internal Properties

Table 1. ShuffleExternalSorter’s Internal Properties (e.g. Registries, Counters and Flags)
Name Description



Used when…​FIXME