BlockId¶

BlockId is an abstraction of data block identifiers based on an unique name.

Contract¶

Name¶

name: String

A globally unique identifier of this Block

Used when:

BlockManager is requested to putBlockDataAsStream and readDiskBlockFromSameHostExecutor
UpdateBlockInfo is requested to writeExternal
DiskBlockManager is requested to getFile and containsBlock
DiskStore is requested to getBytes, remove, moveFileToBlock, contains

Implementations¶

Sealed Abstract Class

BlockId is a Scala sealed abstract class which means that all of the implementations are in the same compilation unit (a single file).

Learn more in the Scala Language Specification.

BroadcastBlockId¶

BlockId for broadcast variable blocks:

broadcastId identifier
Optional field name (default: empty)

Uses broadcast_ prefix for the name

Used when:

TorrentBroadcast is created, requested to store a broadcast and the blocks in a local BlockManager, and read blocks
BlockManager is requested to remove all the blocks of a broadcast variable
SerializerManager is requested to shouldCompress
AppStatusListener is requested to onBlockUpdated

CacheId¶

BlockId for...FIXME

PythonStreamBlockId¶

BlockId for...FIXME

RDDBlockId¶

BlockId for RDD partitions:

rddId identifier
splitIndex identifier

Uses rdd_ prefix for the name

Used when:

StorageStatus is requested to register the status of a data block, get the status of a data block, updateStorageInfo
LocalRDDCheckpointData is requested to doCheckpoint
RDD is requested to getOrCompute
DAGScheduler is requested for the BlockManagers (executors) for cached RDD partitions
BlockManagerMasterEndpoint is requested to removeRdd
AppStatusListener is requested to updateRDDBlock (when onBlockUpdated for an RDDBlockId)

Compressed when spark.rdd.compress configuration property is enabled

ShuffleBlockBatchId¶

BlockId for...FIXME

ShuffleBlockChunkId¶

BlockId for shuffle block chunks in Push-Based Shuffle:

shuffleId identifier
shuffleMergeId identifier
reduceId identifier
chunkId identifier

Uses shuffleChunk_[shuffleId]_[shuffleMergeId]_[reduceId]_[chunkId] pattern for the name

ShuffleBlockId¶

BlockId for shuffle blocks:

shuffleId identifier
mapId identifier
reduceId identifier

Uses shuffle_ prefix for the name

Used when:

ShuffleBlockFetcherIterator is requested to throwFetchFailedException
MapOutputTracker utility is requested to convertMapStatuses
NettyBlockRpcServer is requested to handle a FetchShuffleBlocks message
ExternalSorter is requested to writePartitionedMapOutput
ShuffleBlockFetcherIterator is requested to mergeContinuousShuffleBlockIdsIfNeeded
IndexShuffleBlockResolver is requested to getBlockData

Compressed when spark.shuffle.compress configuration property is enabled

ShuffleChecksumBlockId¶

BlockId for...FIXME

ShuffleDataBlockId¶

BlockId for...FIXME

ShuffleIndexBlockId¶

BlockId for...FIXME

ShuffleMergedBlockId¶

BlockId for...FIXME

ShuffleMergedDataBlockId¶

BlockId for...FIXME

ShuffleMergedIndexBlockId¶

BlockId for...FIXME

ShuffleMergedMetaBlockId¶

BlockId for...FIXME

ShufflePushBlockId¶

BlockId for...FIXME

StreamBlockId¶

BlockId for...FIXME:

streamId
uniqueId

Uses the following name:

input-[streamId]-[uniqueId]

Used in Spark Streaming

TaskResultBlockId¶

BlockId for...FIXME

TempLocalBlockId¶

BlockId for...FIXME

TempShuffleBlockId¶

BlockId for...FIXME

TestBlockId¶

BlockId for...FIXME

Creating BlockId by Name¶

apply(
  name: String): BlockId

apply creates one of the available BlockIds by the given name (that uses a prefix to differentiate between different BlockIds).

apply is used when:

NettyBlockRpcServer is requested to handle OpenBlocks, UploadBlock messages and receiveStream
UpdateBlockInfo is requested to deserialize (readExternal)
DiskBlockManager is requested for all the blocks (from files stored on disk)
ShuffleBlockFetcherIterator is requested to sendRequest
JsonProtocol utility is used to accumValueFromJson, taskMetricsFromJson and blockUpdatedInfoFromJson