MapStatus¶
MapStatus
is an abstraction of shuffle map output statuses with an estimated size, location and map Id.
MapStatus
is a result of executing a ShuffleMapTask.
After a ShuffleMapTask has finished execution successfully, DAGScheduler
is requested to handle a ShuffleMapTask completion that in turn requests the MapOutputTrackerMaster to register the MapStatus.
Contract¶
Estimated Size¶
getSizeForBlock(
reduceId: Int): Long
Estimated size (in bytes)
Used when:
MapOutputTrackerMaster
is requested for a MapOutputStatistics and locations with the largest number of shuffle map outputsMapOutputTracker
utility is used to convert MapStatusesOptimizeSkewedJoin
(Spark SQL) physical optimization is executed
Location¶
location: BlockManagerId
BlockManagerId of the shuffle map output (i.e. the BlockManager where a ShuffleMapTask
ran and the result is stored)
Used when:
ShuffleStatus
is requested to removeMapOutput and removeOutputsByFilterMapOutputTrackerMaster
is requested for locations with the largest number of shuffle map outputs and getMapLocationMapOutputTracker
utility is used to convert MapStatusesDAGScheduler
is requested to handle a ShuffleMapTask completion
Map Id¶
mapId: Long
Map Id of the shuffle map output
Used when:
MapOutputTracker
utility is used to convert MapStatuses
Implementations¶
Sealed Trait
MapStatus
is a Scala sealed trait which means that all of the implementations are in the same compilation unit (a single file).
spark.shuffle.minNumPartitionsToHighlyCompress¶
MapStatus
utility uses spark.shuffle.minNumPartitionsToHighlyCompress internal configuration property for the minimum number of partitions to prefer a HighlyCompressedMapStatus.
Creating MapStatus¶
apply(
loc: BlockManagerId,
uncompressedSizes: Array[Long],
mapTaskId: Long): MapStatus
apply
creates a HighlyCompressedMapStatus when the number of uncompressedSizes
is above minPartitionsToUseHighlyCompressMapStatus threshold. Otherwise, apply
creates a CompressedMapStatus.
apply
is used when:
SortShuffleWriter
is requested to write recordsBypassMergeSortShuffleWriter
is requested to write recordsUnsafeShuffleWriter
is requested to close resources and write out merged spill files