MicroBatchScanExec Physical Operator¶
MicroBatchScanExec is a DataSourceV2ScanExecBase (Spark SQL) that represents StreamingDataSourceV2Relation logical operator at execution.
MicroBatchScanExec is just a very thin wrapper over MicroBatchStream and does nothing but delegates all the important execution-specific preparation to it.
Creating Instance¶
MicroBatchScanExec takes the following to be created:
- Output
Attributes (Spark SQL) -
Scan(Spark SQL) - MicroBatchStream
- Start Offset
- End Offset
- Key-Grouped Partitioning
MicroBatchScanExec is created when:
DataSourceV2Strategy(Spark SQL) execution planning strategy is requested to plan a logical query plan with a StreamingDataSourceV2Relation
Note
All the properties to create a MicroBatchScanExec are copied straight from a StreamingDataSourceV2Relation directly.
MicroBatchStream¶
MicroBatchScanExec is given a MicroBatchStream when created.
The MicroBatchStream is the SparkDataStream of the StreamingDataSourceV2Relation logical operator (it was created from).
MicroBatchScanExec uses the MicroBatchStream to handle inputPartitions, readerFactory and inputRDD (that are all the custom overloaded methods of MicroBatchScanExec).
Input Partitions¶
inputPartitions: Seq[InputPartition]
inputPartitions is part of the DataSourceV2ScanExecBase (Spark SQL) abstraction.
inputPartitions requests the MicroBatchStream to planInputPartitions for the start and end offsets.
PartitionReaderFactory¶
readerFactory: PartitionReaderFactory
readerFactory is part of the DataSourceV2ScanExecBase (Spark SQL) abstraction.
readerFactory requests the MicroBatchStream to createReaderFactory.
Input RDD¶
inputRDD: RDD[InternalRow]
inputRDD is part of the DataSourceV2ScanExecBase (Spark SQL) abstraction.
inputRDD creates a DataSourceRDD (Spark SQL) for the partitions and the PartitionReaderFactory (and the others).