Barrier Coordinator RPC Endpoint¶
In other words, the driver sets up a
BarrierCoordinator (TaskSchedulerImpl precisely) upon startup that BarrierTaskContexts talk to using RequestToSync messages.
BarrierCoordinator tracks the number of tasks to wait for until a barrier stage is complete and a response can be sent back to the tasks to continue (that are paused for 365 days (!)).
BarrierCoordinator takes the following to be created:
BarrierCoordinator is created when:
TaskSchedulerImplis requested to maybeInitBarrierCoordinator
Processing RequestToSync Messages (from Barrier Tasks)¶
context: RpcCallContext): PartialFunction[Any, Unit]
receiveAndReply is part of the RpcEndpoint abstraction.
receiveAndReply handles RequestToSync messages.
Multiple Tasks and One BarrierCoordinator
states: ConcurrentHashMap[ContextBarrierId, ContextBarrierState]
states registry is used to keep track of all the active barrier stage attempts and the corresponding internal ContextBarrierState.
states is used when:
- onStop to clean up
- cleanupBarrierStage to remove a specific stage attempt
- receiveAndReply to handle RequestToSync messages
SparkListener is used to intercept SparkListenerStageCompleted events.
stageCompleted: SparkListenerStageCompleted): Unit
onStageCompleted is part of the SparkListenerInterface abstraction.
onStageCompleted cleanupBarrierStage for the stage and the attempt number (based on the given
ALL logging level for
org.apache.spark.BarrierCoordinator logger to see what happens inside.
Add the following line to
logger.BarrierCoordinator.name = org.apache.spark.BarrierCoordinator
logger.BarrierCoordinator.level = all
Refer to Logging.