Skip to content

FileBatchWrite

FileBatchWrite is a BatchWrite that uses the given FileCommitProtocol to coordinate a writing job (abort or commit).

Creating Instance

FileBatchWrite takes the following to be created:

FileBatchWrite is created when:

Aborting Write Job

abort(
  messages: Array[WriterCommitMessage]): Unit

abort requests the FileCommitProtocol to abort the Job.

abort is part of the BatchWrite abstraction.

Committing Write Job

commit(
  messages: Array[WriterCommitMessage]): Unit

commit prints out the following INFO message to the logs:

Start to commit write Job [uuid].

commit requests the FileCommitProtocol to commit the Job (with the WriteTaskResult extracted from the given WriterCommitMessages). commit measures the commit duration.

commit prints out the following INFO message to the logs:

Write Job [uuid] committed. Elapsed time: [duration] ms.

commit handles the statistics of this write job.

In the end, commit prints out the following INFO message to the logs:

Finished processing stats for write job [uuid].

commit is part of the BatchWrite abstraction.

Creating Batch DataWriterFactory

createBatchWriterFactory(
  info: PhysicalWriteInfo): DataWriterFactory

createBatchWriterFactory creates a new FileWriterFactory.

createBatchWriterFactory is part of the BatchWrite abstraction.

useCommitCoordinator

useCommitCoordinator(): Boolean

FileBatchWrite does not require a Commit Coordinator (and returns false).

useCommitCoordinator is part of the BatchWrite abstraction.

Logging

Enable ALL logging level for org.apache.spark.sql.execution.datasources.v2.FileBatchWrite logger to see what happens inside.

Add the following line to conf/log4j2.properties:

log4j.logger.org.apache.spark.sql.execution.datasources.v2.FileBatchWrite=ALL

Refer to Logging.