Skip to content

ShuffleWriteProcessor

ShuffleWriteProcessor controls write behavior in ShuffleMapTasks while writing partition records out to the shuffle system.

ShuffleWriteProcessor is used to create a ShuffleDependency.

Creating Instance

ShuffleWriteProcessor takes no arguments to be created.

ShuffleWriteProcessor is created when:

  • ShuffleDependency is created
  • ShuffleExchangeExec (Spark SQL) physical operator is requested to createShuffleWriteProcessor

Writing Partition Records to Shuffle System

write(
  rdd: RDD[_],
  dep: ShuffleDependency[_, _, _],
  mapId: Long,
  context: TaskContext,
  partition: Partition): MapStatus

write requests the ShuffleManager for the ShuffleWriter for the ShuffleHandle (of the given ShuffleDependency).

write requests the ShuffleWriter to write out records (of the given Partition and RDD).

In the end, write requests the ShuffleWriter to stop (with the success flag enabled).

In case of any Exceptions, write requests the ShuffleWriter to stop (with the success flag disabled).

write is used when ShuffleMapTask is requested to run.

Creating MetricsReporter

createMetricsReporter(
  context: TaskContext): ShuffleWriteMetricsReporter

createMetricsReporter creates a ShuffleWriteMetricsReporter from the given TaskContext.

createMetricsReporter requests the given TaskContext for TaskMetrics and then for the ShuffleWriteMetrics.