Skip to content

BlockManagerMaster

BlockManagerMaster runs on the driver and executors to exchange block metadata (status and locations) in a Spark application.

BlockManagerMaster uses BlockManagerMasterEndpoint (registered as BlockManagerMaster RPC endpoint on the driver with the endpoint references on executors) for executors to send block status updates and so let the driver keep track of block status and locations.

Creating Instance

BlockManagerMaster takes the following to be created:

BlockManagerMaster is created when:

Driver Endpoint

BlockManagerMaster is given a RpcEndpointRef of the BlockManagerMaster RPC Endpoint (on the driver) when created.

Heartbeat Endpoint

BlockManagerMaster is given a RpcEndpointRef of the BlockManagerMasterHeartbeat RPC Endpoint (on the driver) when created.

The endpoint is used (mainly) when:

Registering BlockManager (on Executor) with Driver

registerBlockManager(
  id: BlockManagerId,
  localDirs: Array[String],
  maxOnHeapMemSize: Long,
  maxOffHeapMemSize: Long,
  storageEndpoint: RpcEndpointRef): BlockManagerId

registerBlockManager prints out the following INFO message to the logs (with the given BlockManagerId):

Registering BlockManager [id]

Registering BlockManager with the Driver

registerBlockManager notifies the driver (using the BlockManagerMaster RPC endpoint) that the BlockManagerId wants to register (and sends a blocking RegisterBlockManager message).

Note

The input maxMemSize is the total available on-heap and off-heap memory for storage on the BlockManager.

registerBlockManager waits until a confirmation comes (as a possibly-updated BlockManagerId).

In the end, registerBlockManager prints out the following INFO message to the logs and returns the BlockManagerId received.

Registered BlockManager [updatedId]

registerBlockManager is used when:

Finding Block Locations for Single Block

getLocations(
  blockId: BlockId): Seq[BlockManagerId]

getLocations requests the driver (using the BlockManagerMaster RPC endpoint) for BlockManagerIds of the given BlockId (and sends a blocking GetLocations message).

getLocations is used when:

Finding Block Locations for Multiple Blocks

getLocations(
  blockIds: Array[BlockId]): IndexedSeq[Seq[BlockManagerId]]

getLocations requests the driver (using the BlockManagerMaster RPC endpoint) for BlockManagerIds of the given BlockIds (and sends a blocking GetLocationsMultipleBlockIds message).

getLocations is used when:

contains

contains(
  blockId: BlockId): Boolean

contains is positive (true) when there is at least one executor with the given BlockId.

contains is used when:

Logging

Enable ALL logging level for org.apache.spark.storage.BlockManagerMaster logger to see what happens inside.

Add the following line to conf/log4j.properties:

log4j.logger.org.apache.spark.storage.BlockManagerMaster=ALL

Refer to Logging.


Last update: 2021-07-14
Back to top