Skip to content

ShuffleManager

ShuffleManager is an abstraction of shuffle managers that manage shuffle data.

ShuffleManager is specified using spark.shuffle.manager configuration property.

ShuffleManager is used to create a BlockManager.

Contract

Getting ShuffleReader for ShuffleHandle

getReader[K, C](
  handle: ShuffleHandle,
  startPartition: Int,
  endPartition: Int,
  context: TaskContext,
  metrics: ShuffleReadMetricsReporter): ShuffleReader[K, C]

ShuffleReader to read shuffle data in the ShuffleHandle

Used when the following RDDs are requested to compute a partition:

getReaderForRange

getReaderForRange[K, C](
  handle: ShuffleHandle,
  startMapIndex: Int,
  endMapIndex: Int,
  startPartition: Int,
  endPartition: Int,
  context: TaskContext,
  metrics: ShuffleReadMetricsReporter): ShuffleReader[K, C]

ShuffleReader for a range of reduce partitions to read from map output in the ShuffleHandle

Used when ShuffledRowRDD (Spark SQL) is requested to compute a partition

Getting ShuffleWriter for ShuffleHandle

getWriter[K, V](
  handle: ShuffleHandle,
  mapId: Long,
  context: TaskContext,
  metrics: ShuffleWriteMetricsReporter): ShuffleWriter[K, V]

ShuffleWriter to write shuffle data in the ShuffleHandle

Used when ShuffleWriteProcessor is requested to write a partition

Registering Shuffle of ShuffleDependency (and Getting ShuffleHandle)

registerShuffle[K, V, C](
  shuffleId: Int,
  dependency: ShuffleDependency[K, V, C]): ShuffleHandle

Registers a shuffle (by the given shuffleId and ShuffleDependency) and gives a ShuffleHandle

Used when ShuffleDependency is created (and registers with the shuffle system)

ShuffleBlockResolver

shuffleBlockResolver: ShuffleBlockResolver

ShuffleBlockResolver of the shuffle system

Used when:

Stopping ShuffleManager

stop(): Unit

Stops the shuffle system

Used when SparkEnv is requested to stop

Unregistering Shuffle

unregisterShuffle(
  shuffleId: Int): Boolean

Unregisters a given shuffle

Used when BlockManagerSlaveEndpoint is requested to handle a RemoveShuffle message

Implementations

Accessing ShuffleManager using SparkEnv

ShuffleManager is available on the driver and executors using SparkEnv.shuffleManager.

val shuffleManager = SparkEnv.get.shuffleManager

Last update: 2020-10-09