BlockId¶
BlockId is an abstraction of data block identifiers based on an unique name.
Contract¶
Name¶
name: String
A globally unique identifier of this Block
Used when:
BlockManageris requested to putBlockDataAsStream and readDiskBlockFromSameHostExecutorUpdateBlockInfois requested to writeExternalDiskBlockManageris requested to getFile and containsBlockDiskStoreis requested to getBytes, remove, moveFileToBlock, contains
Implementations¶
Sealed Abstract Class
BlockId is a Scala sealed abstract class which means that all of the implementations are in the same compilation unit (a single file).
Learn more in the Scala Language Specification.
BroadcastBlockId¶
BlockId for broadcast variable blocks:
broadcastIdidentifier- Optional
fieldname (default:empty)
Uses broadcast_ prefix for the name
Used when:
TorrentBroadcastis created, requested to store a broadcast and the blocks in a local BlockManager, and read blocksBlockManageris requested to remove all the blocks of a broadcast variableSerializerManageris requested to shouldCompressAppStatusListeneris requested to onBlockUpdated
CacheId¶
BlockId for...FIXME
PythonStreamBlockId¶
BlockId for...FIXME
RDDBlockId¶
BlockId for RDD partitions:
rddIdidentifiersplitIndexidentifier
Uses rdd_ prefix for the name
Used when:
StorageStatusis requested to register the status of a data block, get the status of a data block, updateStorageInfoLocalRDDCheckpointDatais requested to doCheckpointRDDis requested to getOrComputeDAGScheduleris requested for the BlockManagers (executors) for cached RDD partitionsBlockManagerMasterEndpointis requested to removeRddAppStatusListeneris requested to updateRDDBlock (when onBlockUpdated for anRDDBlockId)
Compressed when spark.rdd.compress configuration property is enabled
ShuffleBlockBatchId¶
BlockId for...FIXME
ShuffleBlockChunkId¶
BlockId for shuffle block chunks in Push-Based Shuffle:
shuffleIdidentifiershuffleMergeIdidentifierreduceIdidentifierchunkIdidentifier
Uses shuffleChunk_[shuffleId]_[shuffleMergeId]_[reduceId]_[chunkId] pattern for the name
ShuffleBlockId¶
BlockId for shuffle blocks:
shuffleIdidentifiermapIdidentifierreduceIdidentifier
Uses shuffle_ prefix for the name
Used when:
ShuffleBlockFetcherIteratoris requested to throwFetchFailedExceptionMapOutputTrackerutility is requested to convertMapStatusesNettyBlockRpcServeris requested to handle a FetchShuffleBlocks messageExternalSorteris requested to writePartitionedMapOutputShuffleBlockFetcherIteratoris requested to mergeContinuousShuffleBlockIdsIfNeededIndexShuffleBlockResolveris requested to getBlockData
Compressed when spark.shuffle.compress configuration property is enabled
ShuffleChecksumBlockId¶
BlockId for...FIXME
ShuffleDataBlockId¶
BlockId for...FIXME
ShuffleIndexBlockId¶
BlockId for...FIXME
ShuffleMergedBlockId¶
BlockId for...FIXME
ShuffleMergedDataBlockId¶
BlockId for...FIXME
ShuffleMergedIndexBlockId¶
BlockId for...FIXME
ShuffleMergedMetaBlockId¶
BlockId for...FIXME
ShufflePushBlockId¶
BlockId for...FIXME
StreamBlockId¶
BlockId for...FIXME:
streamIduniqueId
Uses the following name:
input-[streamId]-[uniqueId]
Used in Spark Streaming
TaskResultBlockId¶
BlockId for...FIXME
TempLocalBlockId¶
BlockId for...FIXME
TempShuffleBlockId¶
BlockId for...FIXME
TestBlockId¶
BlockId for...FIXME
Creating BlockId by Name¶
apply(
name: String): BlockId
apply creates one of the available BlockIds by the given name (that uses a prefix to differentiate between different BlockIds).
apply is used when:
NettyBlockRpcServeris requested to handle OpenBlocks, UploadBlock messages and receiveStreamUpdateBlockInfois requested to deserialize (readExternal)DiskBlockManageris requested for all the blocks (from files stored on disk)ShuffleBlockFetcherIteratoris requested to sendRequestJsonProtocolutility is used to accumValueFromJson, taskMetricsFromJson and blockUpdatedInfoFromJson