Skip to content

DataSourceV2ScanExecBase Leaf Physical Operators

DataSourceV2ScanExecBase is an extension of LeafExecNode abstraction for leaf physical operators that track number of output rows when executed (with or without support for columnar reads).


Input Partitions

partitions: Seq[InputPartition]

Used when:

Input RDD

inputRDD: RDD[InternalRow]


keyGroupedPartitioning: Option[Seq[Expression]]


readerFactory: PartitionReaderFactory

PartitionReaderFactory for partition readers (of the inputPartitions)

Used when:

  • BatchScanExec physical operator is requested for an input RDD
  • ContinuousScanExec and MicroBatchScanExec physical operators (from Spark Structured Streaming) are requested for an inputRDD
  • DataSourceV2ScanExecBase physical operator is requested to outputPartitioning or supportsColumnar


scan: Scan



Executing Physical Operator

doExecute(): RDD[InternalRow]

doExecute is part of the SparkPlan abstraction.



doExecuteColumnar(): RDD[ColumnarBatch]

doExecuteColumnar is part of the SparkPlan abstraction.


Performance Metrics

metrics: Map[String, SQLMetric]

metrics is part of the SparkPlan abstraction.

metrics is the following SQLMetrics with the customMetrics:

Metric Name web UI
numOutputRows number of output rows

Output Data Partitioning Requirements

outputPartitioning: physical.Partitioning

outputPartitioning is part of the SparkPlan abstraction.


Simple Node Description

    maxFields: Int): String

simpleString is part of the TreeNode abstraction.



supportsColumnar: Boolean

supportsColumnar is part of the SparkPlan abstraction.

supportsColumnar is true if the PartitionReaderFactory can supportColumnarReads for all the inputPartitions. Otherwise, supportsColumnar is false.

supportsColumnar makes sure that either all the inputPartitions are supportColumnarReads or none, or throws an IllegalArgumentException:

Cannot mix row-based and columnar input partitions.

Custom Metrics

customMetrics: Map[String, SQLMetric]
Lazy Value

customMetrics is a Scala lazy value to guarantee that the code to initialize it is executed once only (when accessed for the first time) and the computed value never changes afterwards.

Learn more in the Scala Language Specification.

customMetrics requests the Scan for supportedCustomMetrics that are then converted to SQLMetrics.

customMetrics is used when:

  • DataSourceV2ScanExecBase is requested for the performance metrics
  • BatchScanExec is requested for the inputRDD
  • ContinuousScanExec is requested for the inputRDD
  • MicroBatchScanExec is requested for the inputRDD (that creates a DataSourceRDD)