FileBatchWrite¶
FileBatchWrite
is a BatchWrite that uses the given FileCommitProtocol to coordinate a writing job (abort or commit).
Creating Instance¶
FileBatchWrite
takes the following to be created:
- Hadoop Job
-
WriteJobDescription
-
FileCommitProtocol
(Spark Core)
FileBatchWrite
is created when:
FileWrite
is requested for a BatchWrite
Aborting Write Job¶
abort(
messages: Array[WriterCommitMessage]): Unit
abort
requests the FileCommitProtocol to abort the Job.
abort
is part of the BatchWrite abstraction.
Committing Write Job¶
commit(
messages: Array[WriterCommitMessage]): Unit
commit
prints out the following INFO message to the logs:
Start to commit write Job [uuid].
commit
requests the FileCommitProtocol to commit the Job (with the WriteTaskResult
extracted from the given WriterCommitMessage
s). commit
measures the commit duration.
commit
prints out the following INFO message to the logs:
Write Job [uuid] committed. Elapsed time: [duration] ms.
commit
handles the statistics of this write job.
In the end, commit
prints out the following INFO message to the logs:
Finished processing stats for write job [uuid].
commit
is part of the BatchWrite abstraction.
Creating Batch DataWriterFactory¶
createBatchWriterFactory(
info: PhysicalWriteInfo): DataWriterFactory
createBatchWriterFactory
creates a new FileWriterFactory.
createBatchWriterFactory
is part of the BatchWrite abstraction.
useCommitCoordinator¶
useCommitCoordinator(): Boolean
FileBatchWrite
does not require a Commit Coordinator (and returns false
).
useCommitCoordinator
is part of the BatchWrite abstraction.
Logging¶
Enable ALL
logging level for org.apache.spark.sql.execution.datasources.v2.FileBatchWrite
logger to see what happens inside.
Add the following line to conf/log4j2.properties
:
log4j.logger.org.apache.spark.sql.execution.datasources.v2.FileBatchWrite=ALL
Refer to Logging.