BlockStoreShuffleReader¶

BlockStoreShuffleReader[K, C] is a ShuffleReader of K keys and C values.

Creating Instance¶

BlockStoreShuffleReader takes the following to be created:

BaseShuffleHandle
Blocks by Address (Iterator[(BlockManagerId, Seq[(BlockId, Long, Int)])])
TaskContext
ShuffleReadMetricsReporter
SerializerManager
BlockManager
MapOutputTracker
shouldBatchFetch flag (default: false)

BlockStoreShuffleReader is created when:

SortShuffleManager is requested for a ShuffleReader (for a ShuffleHandle and a range of reduce partitions)

Reading Combined Records (for Reduce Task)¶

ShuffleReader

read(): Iterator[Product2[K, C]]

read is part of the ShuffleReader abstraction.

read creates a ShuffleBlockFetcherIterator.

read...FIXME

fetchContinuousBlocksInBatch¶

fetchContinuousBlocksInBatch: Boolean

fetchContinuousBlocksInBatch reads the following configuration properties to determine whether continuous shuffle block fetching could be used or not:

spark.io.encryption.enabled
spark.shuffle.compress
spark.shuffle.useOldFetchProtocol
supportsRelocationOfSerializedObjects (of the Serializer of the ShuffleDependency of this BaseShuffleHandle)

fetchContinuousBlocksInBatch prints out the following DEBUG message when continuous shuffle block fetching is requested yet not satisfied by the configuration:

The feature tag of continuous shuffle block fetching is set to true, but
we can not enable the feature because other conditions are not satisfied.
Shuffle compress: [compressed], serializer relocatable: [serializerRelocatable],
codec concatenation: [codecConcatenation], use old shuffle fetch protocol:
[useOldFetchProtocol], io encryption: [ioEncryption].