BlockManagerMaster runs on the driver and executors to exchange block metadata (status and locations) in a Spark application.
BlockManagerMaster uses BlockManagerMasterEndpoint (registered as BlockManagerMaster RPC endpoint on the driver with the endpoint references on executors) for executors to send block status updates and so let the driver keep track of block status and locations.
BlockManagerMaster takes the following to be created:
- Driver Endpoint
- Heartbeat Endpoint
isDriverflag (whether it is created for the driver or executors)
BlockManagerMaster is created when:
SparkEnvutility is used to create a SparkEnv (and create a BlockManager)
BlockManagerMaster is given a RpcEndpointRef of the BlockManagerMaster RPC Endpoint (on the driver) when created.
BlockManagerMaster is given a RpcEndpointRef of the BlockManagerMasterHeartbeat RPC Endpoint (on the driver) when created.
The endpoint is used (mainly) when:
DAGScheduleris requested to executorHeartbeatReceived
Registering BlockManager (on Executor) with Driver¶
registerBlockManager( id: BlockManagerId, localDirs: Array[String], maxOnHeapMemSize: Long, maxOffHeapMemSize: Long, storageEndpoint: RpcEndpointRef): BlockManagerId
registerBlockManager prints out the following INFO message to the logs (with the given BlockManagerId):
Registering BlockManager [id]
registerBlockManager notifies the driver (using the BlockManagerMaster RPC endpoint) that the BlockManagerId wants to register (and sends a blocking RegisterBlockManager message).
maxMemSize is the total available on-heap and off-heap memory for storage on the
registerBlockManager waits until a confirmation comes (as a possibly-updated BlockManagerId).
In the end,
registerBlockManager prints out the following INFO message to the logs and returns the BlockManagerId received.
Registered BlockManager [updatedId]
registerBlockManager is used when:
BlockManageris requested to initialize and reregister
FallbackStorageutility is used to registerBlockManagerIfNeeded
Finding Block Locations for Single Block¶
getLocations( blockId: BlockId): Seq[BlockManagerId]
getLocations requests the driver (using the BlockManagerMaster RPC endpoint) for BlockManagerIds of the given BlockId (and sends a blocking GetLocations message).
getLocations is used when:
BlockManageris requested to fetchRemoteManagedBuffer
BlockManagerMasteris requested to contains a BlockId
Finding Block Locations for Multiple Blocks¶
getLocations( blockIds: Array[BlockId]): IndexedSeq[Seq[BlockManagerId]]
getLocations requests the driver (using the BlockManagerMaster RPC endpoint) for BlockManagerIds of the given BlockIds (and sends a blocking GetLocationsMultipleBlockIds message).
getLocations is used when:
DAGScheduleris requested for BlockManagers (executors) for cached RDD partitions
BlockManageris requested to getLocationBlockIds
BlockManagerutility is used to blockIdsToLocations
contains( blockId: BlockId): Boolean
contains is positive (
true) when there is at least one executor with the given BlockId.
contains is used when:
LocalRDDCheckpointDatais requested to doCheckpoint
ALL logging level for
org.apache.spark.storage.BlockManagerMaster logger to see what happens inside.
Add the following line to
Refer to Logging.