Skip to content

MicroBatchWrite

MicroBatchWrite is a BatchWrite (Spark SQL) for WriteToDataSourceV2 logical operator in Micro-Batch Stream Processing.

WriteToMicroBatchDataSource

WriteToDataSourceV2 logical operator replaces WriteToMicroBatchDataSource logical operator at logical optimization (using V2Writes logical optimization).

MicroBatchWrite is just a very thin wrapper over StreamingWrite and does nothing but delegates (relays) all the important execution-specific calls to it.

Creating Instance

MicroBatchWrite takes the following to be created:

MicroBatchWrite is created when:

Committing Writing Job

commit(
  messages: Array[WriterCommitMessage]): Unit

commit is part of the BatchWrite (Spark SQL) abstraction.


commit requests the StreamingWrite to commit.

Creating DataWriterFactory for Batch Write

createBatchWriterFactory(
  info: PhysicalWriteInfo): DataWriterFactory

createBatchWriterFactory is part of the BatchWrite (Spark SQL) abstraction.


createBatchWriterFactory requests the StreamingWrite to create a StreamingDataWriterFactory.

In the end, createBatchWriterFactory creates a MicroBatchWriterFactory (with the given epochId and the StreamingDataWriterFactory).