ShuffleInMemorySorter

ShuffleInMemorySorter is used by ShuffleExternalSorter to sort pointers of key-value records and partition IDs using radix or tim sort algorithms.

ShuffleInMemorySorter
Figure 1. ShuffleInMemorySorter and ShuffleExternalSorter

Creating Instance

ShuffleInMemorySorter takes the following to be created:

ShuffleInMemorySorter requests the given MemoryConsumer to allocate an array of the given initial size for the Unsafe LongArray of Record Pointers and Partition IDs.

ShuffleInMemorySorter is created for a ShuffleExternalSorter.

Iterator of Records Sorted

ShuffleSorterIterator getSortedIterator()

getSortedIterator…​FIXME

getSortedIterator is used when ShuffleExternalSorter is requested to writeSortedFile.

Resetting

void reset()

reset…​FIXME

reset is used when…​FIXME

numRecords Method

int numRecords()

numRecords…​FIXME

numRecords is used when…​FIXME

Calculating Usable Capacity

int getUsableCapacity()

getUsableCapacity calculates the capacity that is a half or two-third of the memory used for the LongArray.

getUsableCapacity is used when…​FIXME

Logging

Enable ALL logging level for org.apache.spark.shuffle.sort.ShuffleExternalSorter logger to see what happens inside.

Add the following line to conf/log4j.properties:

log4j.logger.org.apache.spark.shuffle.sort.ShuffleExternalSorter=ALL

Refer to Logging.

Internal Properties

Unsafe LongArray of Record Pointers and Partition IDs

ShuffleInMemorySorter uses a LongArray.

Usable Capacity

ShuffleInMemorySorter…​FIXME