Skip to content

MapStatus

MapStatus is an abstraction of shuffle map output statuses with an estimated size, location and map Id.

MapStatus is a result of executing a ShuffleMapTask.

After a ShuffleMapTask has finished execution successfully, DAGScheduler is requested to handle a ShuffleMapTask completion that in turn requests the MapOutputTrackerMaster to register the MapStatus.

Contract

Estimated Size

getSizeForBlock(
  reduceId: Int): Long

Estimated size (in bytes)

Used when:

Location

location: BlockManagerId

BlockManagerId of the shuffle map output (i.e. the BlockManager where a ShuffleMapTask ran and the result is stored)

Used when:

Map Id

mapId: Long

Map Id of the shuffle map output

Used when:

Implementations

Sealed Trait

MapStatus is a Scala sealed trait which means that all of the implementations are in the same compilation unit (a single file).

spark.shuffle.minNumPartitionsToHighlyCompress

MapStatus utility uses spark.shuffle.minNumPartitionsToHighlyCompress internal configuration property for the minimum number of partitions to prefer a HighlyCompressedMapStatus.

Creating MapStatus

apply(
  loc: BlockManagerId,
  uncompressedSizes: Array[Long],
  mapTaskId: Long): MapStatus

apply creates a HighlyCompressedMapStatus when the number of uncompressedSizes is above minPartitionsToUseHighlyCompressMapStatus threshold. Otherwise, apply creates a CompressedMapStatus.

apply is used when: