From the scaladoc (it's a
private[spark] class so no way to find it outside the code):
Authority that decides whether tasks can commit output to HDFS. Uses a "first committer wins" policy.
OutputCommitCoordinator is instantiated in both the drivers and executors. On executors, it is configured with a reference to the driver's OutputCommitCoordinatorEndpoint, so requests to commit output will be forwarded to the driver's OutputCommitCoordinator.
This class was introduced in SPARK-4879; see that JIRA issue (and the associated pull requests) for an extensive design discussion.
OutputCommitCoordinator takes the following to be created:
OutputCommitCoordinator is created when:
SparkEnvutility is used to create a SparkEnv on the driver
OutputCommitCoordinator RPC Endpoint¶
OutputCommitCoordinator is registered as OutputCommitCoordinator (with
OutputCommitCoordinatorEndpoint RPC Endpoint) in the RPC Environment on the driver (when
SparkEnv utility is used to create "base" SparkEnv). Executors have an RpcEndpointRef to the endpoint on the driver.
coordinatorRef is used to post an
AskPermissionToCommitOutput (by executors) to the
OutputCommitCoordinator (when canCommit).
coordinatorRef is used to stop the
OutputCommitCoordinator on the driver (when stop).
canCommit( stage: Int, stageAttempt: Int, partition: Int, attemptNumber: Int): Boolean
canCommit creates a
AskPermissionToCommitOutput message and sends it (asynchronously) to the OutputCommitCoordinator RPC Endpoint.
canCommit is used when:
SparkHadoopMapRedUtilis requested to
spark.hadoop.outputCommitCoordination.enabledconfiguration property enabled)
DataWritingSparkTask(Spark SQL) utility is used to
handleAskPermissionToCommit( stage: Int, stageAttempt: Int, partition: Int, attemptNumber: Int): Boolean
handleAskPermissionToCommit is used when:
OutputCommitCoordinatorEndpointis requested to handle a
AskPermissionToCommitOutputmessage (that happens after it was sent out in canCommit)
ALL logging level for
org.apache.spark.scheduler.OutputCommitCoordinator logger to see what happens inside.
Add the following line to
Refer to Logging.