MicroBatchWrite¶

MicroBatchWrite is a BatchWrite (Spark SQL) for WriteToDataSourceV2 logical operator in Micro-Batch Stream Processing.

WriteToMicroBatchDataSource

WriteToDataSourceV2 logical operator replaces WriteToMicroBatchDataSource logical operator at logical optimization (using V2Writes logical optimization).

MicroBatchWrite is just a very thin wrapper over StreamingWrite and does nothing but delegates (relays) all the important execution-specific calls to it.

Creating Instance¶

MicroBatchWrite takes the following to be created:

Epoch ID
StreamingWrite

MicroBatchWrite is created when:

V2Writes (Spark SQL) logical optimization is requested to optimize a logical plan (with a WriteToMicroBatchDataSource)

Committing Writing Job¶

commit(
  messages: Array[WriterCommitMessage]): Unit

commit is part of the BatchWrite (Spark SQL) abstraction.

commit requests the StreamingWrite to commit.

Creating DataWriterFactory for Batch Write¶

createBatchWriterFactory(
  info: PhysicalWriteInfo): DataWriterFactory

createBatchWriterFactory is part of the BatchWrite (Spark SQL) abstraction.

createBatchWriterFactory requests the StreamingWrite to create a StreamingDataWriterFactory.

In the end, createBatchWriterFactory creates a MicroBatchWriterFactory (with the given epochId and the StreamingDataWriterFactory).