Skip to content

MicroBatchScanExec Physical Operator

MicroBatchScanExec is a DataSourceV2ScanExecBase (Spark SQL) that represents StreamingDataSourceV2Relation logical operator at execution.

MicroBatchScanExec is just a very thin wrapper over MicroBatchStream and does nothing but delegates all the important execution-specific preparation to it.

Creating Instance

MicroBatchScanExec takes the following to be created:

MicroBatchScanExec is created when:

Note

All the properties to create a MicroBatchScanExec are copied straight from a StreamingDataSourceV2Relation directly.

MicroBatchStream

MicroBatchScanExec is given a MicroBatchStream when created.

The MicroBatchStream is the SparkDataStream of the StreamingDataSourceV2Relation logical operator (it was created from).

MicroBatchScanExec uses the MicroBatchStream to handle inputPartitions, readerFactory and inputRDD (that are all the custom overloaded methods of MicroBatchScanExec).

Input Partitions

inputPartitions: Seq[InputPartition]

inputPartitions is part of the DataSourceV2ScanExecBase (Spark SQL) abstraction.


inputPartitions requests the MicroBatchStream to planInputPartitions for the start and end offsets.

PartitionReaderFactory

readerFactory: PartitionReaderFactory

readerFactory is part of the DataSourceV2ScanExecBase (Spark SQL) abstraction.


readerFactory requests the MicroBatchStream to createReaderFactory.

Input RDD

inputRDD: RDD[InternalRow]

inputRDD is part of the DataSourceV2ScanExecBase (Spark SQL) abstraction.


inputRDD creates a DataSourceRDD (Spark SQL) for the partitions and the PartitionReaderFactory (and the others).