MapStatus¶
MapStatus is an abstraction of shuffle map output statuses with an estimated size, location and map Id.
MapStatus is a result of executing a ShuffleMapTask.
After a ShuffleMapTask has finished execution successfully, DAGScheduler is requested to handle a ShuffleMapTask completion that in turn requests the MapOutputTrackerMaster to register the MapStatus.
Contract¶
Estimated Size¶
getSizeForBlock(
reduceId: Int): Long
Estimated size (in bytes)
Used when:
MapOutputTrackerMasteris requested for a MapOutputStatistics and locations with the largest number of shuffle map outputsMapOutputTrackerutility is used to convert MapStatusesOptimizeSkewedJoin(Spark SQL) physical optimization is executed
Location¶
location: BlockManagerId
BlockManagerId of the shuffle map output (i.e. the BlockManager where a ShuffleMapTask ran and the result is stored)
Used when:
ShuffleStatusis requested to removeMapOutput and removeOutputsByFilterMapOutputTrackerMasteris requested for locations with the largest number of shuffle map outputs and getMapLocationMapOutputTrackerutility is used to convert MapStatusesDAGScheduleris requested to handle a ShuffleMapTask completion
Map Id¶
mapId: Long
Map Id of the shuffle map output
Used when:
MapOutputTrackerutility is used to convert MapStatuses
Implementations¶
Sealed Trait
MapStatus is a Scala sealed trait which means that all of the implementations are in the same compilation unit (a single file).
spark.shuffle.minNumPartitionsToHighlyCompress¶
MapStatus utility uses spark.shuffle.minNumPartitionsToHighlyCompress internal configuration property for the minimum number of partitions to prefer a HighlyCompressedMapStatus.
Creating MapStatus¶
apply(
loc: BlockManagerId,
uncompressedSizes: Array[Long],
mapTaskId: Long): MapStatus
apply creates a HighlyCompressedMapStatus when the number of uncompressedSizes is above minPartitionsToUseHighlyCompressMapStatus threshold. Otherwise, apply creates a CompressedMapStatus.
apply is used when:
SortShuffleWriteris requested to write recordsBypassMergeSortShuffleWriteris requested to write recordsUnsafeShuffleWriteris requested to close resources and write out merged spill files