MapStatus Contract — Shuffle Map Output Status

MapStatus is the abstraction of shuffle map output statuses that are metadata of shuffle map outputs with estimated size for the reduce block and block location.

After a ShuffleMapTask has finished execution successfully, DAGScheduler is requested to handleTaskCompletion (of the ShuffleMapTask) that requests the MapOutputTrackerMaster to register the MapStatus.

Table 1. MapStatus Contract
Method Description


getSizeForBlock(reduceId: Int): Long

Estimated size for the reduce block (in bytes)

Used when:


location: BlockManagerId

Block location, i.e. the BlockManager where a ShuffleMapTask ran and the result is stored.

Used when:

Table 2. MapStatuses
MapStatus Description


Default MapStatus that compresses the estimated map output size to 8 bits (Byte) for efficient reporting


Stores the average size of non-empty blocks, and a compressed bitmap for tracking which blocks are empty. Used when the number of partitions is above the spark.shuffle.minNumPartitionsToHighlyCompress threshold

MapStatus object uses spark.shuffle.minNumPartitionsToHighlyCompress internal configuration property for the minimum number of partitions threshold to create a HighlyCompressedMapStatus when requested to create a MapStatus.

Creating MapStatus — apply Factory Method

  loc: BlockManagerId,
  uncompressedSizes: Array[Long]): MapStatus

apply creates a concrete MapStatus per the size of the given uncompressedSizes array:

apply is used when: