BatchScanExec Physical Operator¶
BatchScanExec
is a DataSourceV2ScanExecBase leaf physical operator for scanning a batch of data from a Scan.
BatchScanExec
represents a data scan over a DataSourceV2ScanRelation relation at execution.
Creating Instance¶
BatchScanExec
takes the following to be created:
- Output Schema (
Seq[AttributeReference]
) - Scan
- Runtime Filters (Expressions)
- Optional SortOrders
- Table
-
StoragePartitionJoinParams
BatchScanExec
is created when:
- DataSourceV2Strategy execution planning strategy is executed (for physical operators with a DataSourceV2ScanRelation relation)
Input RDD¶
DataSourceV2ScanExecBase
inputRDD: RDD[InternalRow]
inputRDD
is part of the DataSourceV2ScanExecBase abstraction.
For no filteredPartitions and the outputPartitioning to be SinglePartition
, inputRDD
creates an empty RDD[InternalRow]
with 1 partition.
Otherwise, inputRDD
creates a DataSourceRDD as follows:
DataSourceRDD's Attribute | Value |
---|---|
InputPartitions | filteredPartitions |
PartitionReaderFactory | readerFactory |
columnarReads | supportsColumnar |
Custom Metrics | customMetrics |
Filtered Input Partitions¶
filteredPartitions: Seq[Seq[InputPartition]]
Lazy Value
filteredPartitions
is a Scala lazy value to guarantee that the code to initialize it is executed once only (when accessed for the first time) and the computed value never changes afterwards.
Learn more in the Scala Language Specification.
For non-empty runtimeFilters, filteredPartitions
...FIXME
Otherwise, filteredPartitions
is the partitions (that usually is the input partitions of this BatchScanExec
).
Input Partitions¶
DataSourceV2ScanExecBase
inputPartitions: Seq[InputPartition]
inputPartitions
is part of the DataSourceV2ScanExecBase abstraction.
inputPartitions
requests the Batch to plan input partitions.
PartitionReaderFactory¶
DataSourceV2ScanExecBase
readerFactory: PartitionReaderFactory
readerFactory
is part of the DataSourceV2ScanExecBase abstraction.
readerFactory
requests the Batch to createReaderFactory.
Batch¶
batch: Batch
batch
requests the Scan for the physical representation for batch query.
batch
is used when:
BatchScanExec
is requested for partitions and readerFactory
keyGroupedPartitioning¶
DataSourceV2ScanExecBase
keyGroupedPartitioning: Option[Seq[Expression]]
keyGroupedPartitioning
is part of the DataSourceV2ScanExecBase abstraction.
keyGroupedPartitioning
requests this StoragePartitionJoinParams for the keyGroupedPartitioning
.