FileBatchWrite¶
FileBatchWrite is a BatchWrite that uses the given FileCommitProtocol to coordinate a writing job (abort or commit).
Creating Instance¶
FileBatchWrite takes the following to be created:
- Hadoop Job
-
WriteJobDescription -
FileCommitProtocol(Spark Core)
FileBatchWrite is created when:
FileWriteis requested for a BatchWrite
Aborting Write Job¶
abort(
messages: Array[WriterCommitMessage]): Unit
abort requests the FileCommitProtocol to abort the Job.
abort is part of the BatchWrite abstraction.
Committing Write Job¶
commit(
messages: Array[WriterCommitMessage]): Unit
commit prints out the following INFO message to the logs:
Start to commit write Job [uuid].
commit requests the FileCommitProtocol to commit the Job (with the WriteTaskResult extracted from the given WriterCommitMessages). commit measures the commit duration.
commit prints out the following INFO message to the logs:
Write Job [uuid] committed. Elapsed time: [duration] ms.
commit handles the statistics of this write job.
In the end, commit prints out the following INFO message to the logs:
Finished processing stats for write job [uuid].
commit is part of the BatchWrite abstraction.
Creating Batch DataWriterFactory¶
createBatchWriterFactory(
info: PhysicalWriteInfo): DataWriterFactory
createBatchWriterFactory creates a new FileWriterFactory.
createBatchWriterFactory is part of the BatchWrite abstraction.
useCommitCoordinator¶
useCommitCoordinator(): Boolean
FileBatchWrite does not require a Commit Coordinator (and returns false).
useCommitCoordinator is part of the BatchWrite abstraction.
Logging¶
Enable ALL logging level for org.apache.spark.sql.execution.datasources.v2.FileBatchWrite logger to see what happens inside.
Add the following line to conf/log4j2.properties:
log4j.logger.org.apache.spark.sql.execution.datasources.v2.FileBatchWrite=ALL
Refer to Logging.