Skip to content


= BlockId

BlockId is an <> of <> based on an unique <>.

BlockId is a Scala sealed abstract class and so all the possible <> are in the single Scala file alongside BlockId.

== [[contract]] Contract

=== [[name]][[toString]] Unique Name

[source, scala]

name: String

Used when:

  • NettyBlockTransferService is requested to[upload a block]

  • AppStatusListener is requested to[updateRDDBlock],[updateStreamBlock]

  • BlockManager is requested to[putBlockDataAsStream]

  • UpdateBlockInfo is requested to[writeExternal]

  • DiskBlockManager is requested to[getFile] and[containsBlock]

  • DiskStore is requested to[getBytes]

== [[implementations]] Available BlockIds

=== [[BroadcastBlockId]] BroadcastBlockId

BlockId for[]s with broadcastId identifier and optional field name (default: empty)

Uses broadcast_ prefix for the <>

Used when:

  • TorrentBroadcast is[created], requested to[store a broadcast and the blocks in a local BlockManager], and <>

  • BlockManager is requested to[remove all the blocks of a broadcast variable]

  • AppStatusListener is requested to[updateBroadcastBlock] (when[onBlockUpdated] for a BroadcastBlockId)[Compressed] when[spark.broadcast.compress] configuration property is enabled

=== [[RDDBlockId]] RDDBlockId

BlockId for RDD partitions with rddId and splitIndex identifiers

Uses rdd_ prefix for the <>

Used when:

  • StorageStatus is requested to <>, <>, <>

  • LocalCheckpointRDD is requested to compute a partition

  • LocalRDDCheckpointData is requested to[doCheckpoint]

  • RDD is requested to[getOrCompute]

  • DAGScheduler is requested for the[BlockManagers (executors) for cached RDD partitions]

  • AppStatusListener is requested to[updateRDDBlock] (when[onBlockUpdated] for a RDDBlockId)[Compressed] when[spark.rdd.compress] configuration property is enabled (default: false)

=== [[ShuffleBlockId]] ShuffleBlockId

BlockId for FIXME with shuffleId, mapId, and reduceId identifiers

Uses shuffle_ prefix for the <>

Used when:

  • ShuffleBlockFetcherIterator is requested to[throwFetchFailedException]

  • MapOutputTracker object is requested to[convertMapStatuses]

  • SortShuffleWriter is requested to[write partition records]

  • ShuffleBlockResolver is requested for a[ManagedBuffer to read shuffle block data file][Compressed] when[spark.shuffle.compress] configuration property is enabled (default: true)

=== [[ShuffleDataBlockId]] ShuffleDataBlockId

=== [[ShuffleIndexBlockId]] ShuffleIndexBlockId

=== [[StreamBlockId]] StreamBlockId

=== [[TaskResultBlockId]] TaskResultBlockId

=== [[TempLocalBlockId]] TempLocalBlockId

=== [[TempShuffleBlockId]] TempShuffleBlockId

== [[apply]] apply Factory Method

[source, scala]

apply( name: String): BlockId

apply creates one of the available <> by the given name (that uses a prefix to differentiate between different BlockIds).

apply is used when:

  • NettyBlockRpcServer is requested to[handle an RPC message] and[receiveStream]

  • UpdateBlockInfo is requested to deserialize (readExternal)

  • DiskBlockManager is requested for[all the blocks (from files stored on disk)]

  • ShuffleBlockFetcherIterator is requested to[sendRequest]

  • JsonProtocol utility is used to[accumValueFromJson],[taskMetricsFromJson] and[blockUpdatedInfoFromJson]

Last update: 2020-10-06