BlockId¶
BlockId
is an abstraction of data block identifiers based on an unique name.
Contract¶
Name¶
name: String
A globally unique identifier of this Block
Used when:
BlockManager
is requested to putBlockDataAsStream and readDiskBlockFromSameHostExecutorUpdateBlockInfo
is requested to writeExternalDiskBlockManager
is requested to getFile and containsBlockDiskStore
is requested to getBytes, remove, moveFileToBlock, contains
Implementations¶
Sealed Abstract Class
BlockId
is a Scala sealed abstract class which means that all of the implementations are in the same compilation unit (a single file).
BroadcastBlockId¶
BlockId
for broadcast variable blocks:
broadcastId
identifier- Optional
field
name (default:empty
)
Uses broadcast_ prefix for the name
Used when:
TorrentBroadcast
is created, requested to store a broadcast and the blocks in a local BlockManager, and read blocksBlockManager
is requested to remove all the blocks of a broadcast variableSerializerManager
is requested to shouldCompressAppStatusListener
is requested to onBlockUpdated
RDDBlockId¶
BlockId
for RDD partitions:
rddId
identifiersplitIndex
identifier
Uses rdd_ prefix for the name
Used when:
StorageStatus
is requested to register the status of a data block, get the status of a data block, updateStorageInfoLocalRDDCheckpointData
is requested to doCheckpointRDD
is requested to getOrComputeDAGScheduler
is requested for the BlockManagers (executors) for cached RDD partitionsBlockManagerMasterEndpoint
is requested to removeRddAppStatusListener
is requested to updateRDDBlock (when onBlockUpdated for anRDDBlockId
)
Compressed when spark.rdd.compress configuration property is enabled
ShuffleBlockBatchId¶
ShuffleBlockId¶
BlockId
for shuffle blocks:
shuffleId
identifiermapId
identifierreduceId
identifier
Uses shuffle_ prefix for the name
Used when:
ShuffleBlockFetcherIterator
is requested to throwFetchFailedExceptionMapOutputTracker
utility is requested to convertMapStatusesNettyBlockRpcServer
is requested to handle a FetchShuffleBlocks messageExternalSorter
is requested to writePartitionedMapOutputShuffleBlockFetcherIterator
is requested to mergeContinuousShuffleBlockIdsIfNeededIndexShuffleBlockResolver
is requested to getBlockData
Compressed when spark.shuffle.compress configuration property is enabled
ShuffleDataBlockId¶
ShuffleIndexBlockId¶
StreamBlockId¶
BlockId
for ...FIXME:
streamId
uniqueId
Uses the following name:
input-[streamId]-[uniqueId]
Used in Spark Streaming
TaskResultBlockId¶
TempLocalBlockId¶
TempShuffleBlockId¶
TestBlockId¶
Creating BlockId by Name¶
apply(
name: String): BlockId
apply
creates one of the available BlockIds by the given name (that uses a prefix to differentiate between different BlockId
s).
apply
is used when:
NettyBlockRpcServer
is requested to handle OpenBlocks, UploadBlock messages and receiveStreamUpdateBlockInfo
is requested to deserialize (readExternal
)DiskBlockManager
is requested for all the blocks (from files stored on disk)ShuffleBlockFetcherIterator
is requested to sendRequestJsonProtocol
utility is used to accumValueFromJson, taskMetricsFromJson and blockUpdatedInfoFromJson