BatchScanExec Physical Operator¶
BatchScanExec is a DataSourceV2ScanExecBase leaf physical operator for scanning a batch of data from a Scan.
BatchScanExec represents a data scan over a DataSourceV2ScanRelation relation at execution.
Creating Instance¶
BatchScanExec takes the following to be created:
- Output Schema (
Seq[AttributeReference]) - Scan
- Runtime Filters (Expressions)
- Optional SortOrders
- Table
-
StoragePartitionJoinParams
BatchScanExec is created when:
- DataSourceV2Strategy execution planning strategy is executed (for physical operators with a DataSourceV2ScanRelation relation)
Input RDD¶
DataSourceV2ScanExecBase
inputRDD: RDD[InternalRow]
inputRDD is part of the DataSourceV2ScanExecBase abstraction.
For no filteredPartitions and the outputPartitioning to be SinglePartition, inputRDD creates an empty RDD[InternalRow] with 1 partition.
Otherwise, inputRDD creates a DataSourceRDD as follows:
| DataSourceRDD's Attribute | Value |
|---|---|
| InputPartitions | filteredPartitions |
| PartitionReaderFactory | readerFactory |
| columnarReads | supportsColumnar |
| Custom Metrics | customMetrics |
Filtered Input Partitions¶
filteredPartitions: Seq[Seq[InputPartition]]
Lazy Value
filteredPartitions is a Scala lazy value to guarantee that the code to initialize it is executed once only (when accessed for the first time) and the computed value never changes afterwards.
Learn more in the Scala Language Specification.
For non-empty runtimeFilters, filteredPartitions...FIXME
Otherwise, filteredPartitions is the partitions (that usually is the input partitions of this BatchScanExec).
Input Partitions¶
DataSourceV2ScanExecBase
inputPartitions: Seq[InputPartition]
inputPartitions is part of the DataSourceV2ScanExecBase abstraction.
inputPartitions requests the Batch to plan input partitions.
PartitionReaderFactory¶
DataSourceV2ScanExecBase
readerFactory: PartitionReaderFactory
readerFactory is part of the DataSourceV2ScanExecBase abstraction.
readerFactory requests the Batch to createReaderFactory.
Batch¶
batch: Batch
batch requests the Scan for the physical representation for batch query.
batch is used when:
BatchScanExecis requested for partitions and readerFactory
keyGroupedPartitioning¶
DataSourceV2ScanExecBase
keyGroupedPartitioning: Option[Seq[Expression]]
keyGroupedPartitioning is part of the DataSourceV2ScanExecBase abstraction.
keyGroupedPartitioning requests this StoragePartitionJoinParams for the keyGroupedPartitioning.