BatchScanExec Physical Operator¶
BatchScanExec
is a DataSourceV2ScanExecBase leaf physical operator for scanning a batch of data from a Scan.
BatchScanExec
represents a data scan over a DataSourceV2ScanRelation relation at execution.
Creating Instance¶
BatchScanExec
takes the following to be created:
- Output Schema (
Seq[AttributeReference]
) - Scan
- Runtime Filters
- Key Grouped Partitioning
- Ordering
- Table
- Common Partition Values
-
applyPartialClustering
flag (default:false
) -
replicatePartitions
flag (default:false
)
BatchScanExec
is created when:
- DataSourceV2Strategy execution planning strategy is executed (for physical operators with a DataSourceV2ScanRelation relation)
Input RDD¶
For no filteredPartitions and the outputPartitioning to be SinglePartition
, inputRDD
creates an empty RDD[InternalRow]
with 1 partition.
Otherwise, inputRDD
creates a DataSourceRDD as follows:
DataSourceRDD's Attribute | Value |
---|---|
InputPartitions | filteredPartitions |
PartitionReaderFactory | readerFactory |
columnarReads | supportsColumnar |
Custom Metrics | customMetrics |
Filtered Input Partitions¶
filteredPartitions: Seq[Seq[InputPartition]]
Lazy Value
filteredPartitions
is a Scala lazy value to guarantee that the code to initialize it is executed once only (when accessed for the first time) and the computed value never changes afterwards.
Learn more in the Scala Language Specification.
For non-empty runtimeFilters, filteredPartitions
...FIXME
Otherwise, filteredPartitions
is the partitions (that usually is the input partitions of this BatchScanExec
).
Input Partitions¶
Signature
inputPartitions: Seq[InputPartition]
inputPartitions
is part of the DataSourceV2ScanExecBase abstraction.
inputPartitions
requests the Batch to plan input partitions.
PartitionReaderFactory¶
Signature
readerFactory: PartitionReaderFactory
readerFactory
is part of the DataSourceV2ScanExecBase abstraction.
readerFactory
requests the Batch to createReaderFactory.
Batch¶
batch: Batch
batch
requests the Scan for the physical representation for batch query.
batch
is used when:
BatchScanExec
is requested for partitions and readerFactory