ShuffleManager

ShuffleManager is an abstraction of shuffle systems that manage shuffle data.

ShuffleManager is selected using spark.shuffle.manager configuration property.

ShuffleManager is used to create a BlockManager.

Available ShuffleManagers

SortShuffleManager is the default and only known ShuffleManager in Apache Spark.

Accessing ShuffleManager using SparkEnv

The driver and executor access the ShuffleManager instance using SparkEnv.shuffleManager.

val shuffleManager = SparkEnv.get.shuffleManager

Getting ShuffleReader for ShuffleHandle

getReader[K, C](
  handle: ShuffleHandle,
  startPartition: Int,
  endPartition: Int,
  context: TaskContext): ShuffleReader[K, C]

Gives ShuffleReader to read shuffle data in the ShuffleHandle

Used when the following RDDs are requested to compute a partition:

Getting ShuffleWriter for ShuffleHandle

getWriter[K, V](
  handle: ShuffleHandle,
  mapId: Int,
  context: TaskContext): ShuffleWriter[K, V]

Gives ShuffleWriter to write shuffle data in the ShuffleHandle

Used exclusively when ShuffleMapTask is requested to run (and requests the ShuffleWriter to write records for a partition)

Registering Shuffle of ShuffleDependency (and Getting ShuffleHandle)

registerShuffle[K, V, C](
  shuffleId: Int,
  numMaps: Int,
  dependency: ShuffleDependency[K, V, C]): ShuffleHandle

Registers a shuffle (by the given shuffleId and ShuffleDependency) and returns a ShuffleHandle

Used when ShuffleDependency is created (and registers with the shuffle system)

Getting ShuffleBlockResolver

shuffleBlockResolver: ShuffleBlockResolver

Gives ShuffleBlockResolver of the shuffle system

Used when:

Stopping ShuffleManager

stop(): Unit

Stops the shuffle system

Used when SparkEnv is requested to stop

Unregistering Shuffle

unregisterShuffle(
  shuffleId: Int): Boolean

Unregisters a given shuffle

Used when BlockManagerSlaveEndpoint is requested to handle a RemoveShuffle message