Skip to content

BatchScanExec Physical Operator

BatchScanExec is a DataSourceV2ScanExecBase leaf physical operator for scanning a batch of data from a Scan.

BatchScanExec represents a data scan over a DataSourceV2ScanRelation relation at execution.

Creating Instance

BatchScanExec takes the following to be created:

  • Output Schema (Seq[AttributeReference])
  • Scan
  • Runtime Filters
  • Key Grouped Partitioning
  • Ordering
  • Table
  • Common Partition Values
  • applyPartialClustering flag (default: false)
  • replicatePartitions flag (default: false)

BatchScanExec is created when:

Input RDD

Signature
inputRDD: RDD[InternalRow]

inputRDD is part of the DataSourceV2ScanExecBase abstraction.

For no filteredPartitions and the outputPartitioning to be SinglePartition, inputRDD creates an empty RDD[InternalRow] with 1 partition.

Otherwise, inputRDD creates a DataSourceRDD as follows:

DataSourceRDD's Attribute Value
InputPartitions filteredPartitions
PartitionReaderFactory readerFactory
columnarReads supportsColumnar
Custom Metrics customMetrics

Filtered Input Partitions

filteredPartitions: Seq[Seq[InputPartition]]
Lazy Value

filteredPartitions is a Scala lazy value to guarantee that the code to initialize it is executed once only (when accessed for the first time) and the computed value never changes afterwards.

Learn more in the Scala Language Specification.

For non-empty runtimeFilters, filteredPartitions...FIXME

Otherwise, filteredPartitions is the partitions (that usually is the input partitions of this BatchScanExec).

Input Partitions

Signature
inputPartitions: Seq[InputPartition]

inputPartitions is part of the DataSourceV2ScanExecBase abstraction.

inputPartitions requests the Batch to plan input partitions.

PartitionReaderFactory

Signature
readerFactory: PartitionReaderFactory

readerFactory is part of the DataSourceV2ScanExecBase abstraction.

readerFactory requests the Batch to createReaderFactory.

Batch

batch: Batch

batch requests the Scan for the physical representation for batch query.


batch is used when: