Skip to content

ShuffleStatus

ShuffleStatus is a registry of MapStatuses per Partition of a ShuffleMapStage.

ShuffleStatus is used by MapOutputTrackerMaster.

Creating Instance

ShuffleStatus takes the following to be created:

ShuffleStatus is created when:

MapStatuses per Partition

ShuffleStatus creates a mapStatuses internal registry of MapStatuses per partition (using the numPartitions) when created.

A missing partition is when there is no MapStatus for a partition (null at the index of the partition ID) and can be requested using findMissingPartitions.

mapStatuses is all null (for every partition) initially (and so all partitions are missing / uncomputed).

A new MapStatus is added in addMapOutput and updateMapOutput.

A MapStatus is removed (nulled) in removeMapOutput and removeOutputsByFilter.

The number of available MapStatuses is tracked by _numAvailableMapOutputs internal counter.

Used when:

Registering Shuffle Map Output

addMapOutput(
  mapIndex: Int,
  status: MapStatus): Unit

addMapOutput adds the MapStatus to the mapStatuses internal registry.

In case the mapStatuses internal registry had no MapStatus for the mapIndex already available, addMapOutput increments the _numAvailableMapOutputs internal counter and invalidateSerializedMapOutputStatusCache.

addMapOutput is used when:

Deregistering Shuffle Map Output

removeMapOutput(
  mapIndex: Int,
  bmAddress: BlockManagerId): Unit

removeMapOutput...FIXME

removeMapOutput is used when:

Missing Partitions

findMissingPartitions(): Seq[Int]

findMissingPartitions...FIXME

findMissingPartitions is used when:

Serializing Shuffle Map Output Statuses

serializedMapStatus(
  broadcastManager: BroadcastManager,
  isLocal: Boolean,
  minBroadcastSize: Int,
  conf: SparkConf): Array[Byte]

serializedMapStatus...FIXME

serializedMapStatus is used when:

Logging

Enable ALL logging level for org.apache.spark.ShuffleStatus logger to see what happens inside.

Add the following line to conf/log4j.properties:

log4j.logger.org.apache.spark.ShuffleStatus=ALL

Refer to Logging.