Skip to content


BlockStoreShuffleReader is a ShuffleReader.

Creating Instance

BlockStoreShuffleReader takes the following to be created:

BlockStoreShuffleReader is created when:

  • SortShuffleManager is requested for a ShuffleReader (for a ShuffleHandle and a range of reduce partitions)

Reading Combined Records (for Reduce Task)

read(): Iterator[Product2[K, C]]

read is part of the ShuffleReader abstraction.

read creates a ShuffleBlockFetcherIterator.



fetchContinuousBlocksInBatch: Boolean


Review Me

=== [[read]] Reading Combined Records For Reduce Task

Internally, read first[creates a ShuffleBlockFetcherIterator] (passing in the values of <>, <> and <> Spark properties).

NOTE: read uses[MapOutputTracker to find the BlockManagers with the shuffle blocks and sizes] to create ShuffleBlockFetcherIterator.

read creates a new[SerializerInstance] (using Serializer from ShuffleDependency).

read creates a key/value iterator by deserializeStream every shuffle block stream.

read updates the context task metrics for each record read.

NOTE: read uses CompletionIterator (to count the records read) and[InterruptibleIterator] (to support task cancellation).

If the ShuffleDependency has an Aggregator defined, read wraps the current iterator inside an iterator defined by Aggregator.combineCombinersByKey (for mapSideCombine enabled) or Aggregator.combineValuesByKey otherwise.

NOTE: run reports an exception when ShuffleDependency has no Aggregator defined with mapSideCombine flag enabled.

For keyOrdering defined in the ShuffleDependency, run does the following:

  1.[Creates an ExternalSorter]
  2.[Inserts all the records] into the ExternalSorter
  3. Updates context TaskMetrics
  4. Returns a CompletionIterator for the ExternalSorter