SortShuffleManager is the default and only known ShuffleManager in Apache Spark (with the short name
SortShuffleManager takes a single SparkConf to be created.
SortShuffleManager allows for
(1 << 24) partition identifiers that can be encoded (i.e.
Lookup table with the number of mappers producing the output for a shuffle (as java.util.concurrent.ConcurrentHashMap)
shuffleBlockResolver is an IndexShuffleBlockResolver that is created immediately when SortShuffleManager is.
shuffleBlockResolver is part of the ShuffleManager abstraction.
getWriter[K, V]( handle: ShuffleHandle, mapId: Int, context: TaskContext): ShuffleWriter[K, V]
|getWriter expects that the input ShuffleHandle is of type BaseShuffleHandle. Moreover, getWriter further expects that in two (out of three cases) it is a more specialized IndexShuffleBlockResolver.|
getWriter then creates a new ShuffleWriter based on the type of the given ShuffleHandle.
getWriter is part of the ShuffleManager abstraction.
unregisterShuffle( shuffleId: Int): Boolean
unregisterShuffle tries to remove the given
shuffleId from the numMapsForShuffle internal registry.
If the given
shuffleId was registered, unregisterShuffle requests the IndexShuffleBlockResolver to remove the shuffle index and data files one by one (up to the number of mappers producing the output for the shuffle).
unregisterShuffle is part of the ShuffleManager abstraction.
registerShuffle[K, V, C]( shuffleId: Int, numMaps: Int, dependency: ShuffleDependency[K, V, C]): ShuffleHandle
|FIXME Copy the conditions|
registerShuffle returns a new
ShuffleHandle that can be one of the following:
registerShuffle is part of the ShuffleManager abstraction.
getReader[K, C]( handle: ShuffleHandle, startPartition: Int, endPartition: Int, context: TaskContext): ShuffleReader[K, C]
getReader returns a new BlockStoreShuffleReader passing all the input parameters on to it.
getReader assumes that the input
ShuffleHandle is of type BaseShuffleHandle.
getReader is part of the ShuffleManager abstraction.
canUseSerializedShuffle( dependency: ShuffleDependency[_, _, _]): Boolean
true when all of the following hold:
canUseSerializedShuffle prints out the following DEBUG message to the logs:
Can use serialized shuffle for shuffle [shuffleId]
Otherwise, canUseSerializedShuffle does not hold and prints out one of the following DEBUG messages:
Can't use serialized shuffle for shuffle [id] because the serializer, [name], does not support object relocation Can't use serialized shuffle for shuffle [id] because an aggregator is defined Can't use serialized shuffle for shuffle [id] because it has more than [number] partitions
shouldBypassMergeSort is used when SortShuffleManager is requested to register a shuffle (and creates a ShuffleHandle).
ALL logging level for
org.apache.spark.shuffle.sort.SortShuffleManager logger to see what happens inside.
Add the following line to
Refer to Logging.