Skip to content

DataSourceV2ScanExecBase Leaf Physical Operators

DataSourceV2ScanExecBase is an extension of LeafExecNode abstraction for leaf physical operators that track number of output rows when executed (with or without support for columnar reads).

Contract

Input Partitions

partitions: Seq[InputPartition]

See:

Used when:

Input RDD

inputRDD: RDD[InternalRow]

See:

keyGroupedPartitioning

keyGroupedPartitioning: Option[Seq[Expression]]

Optional partitioning expressions

See:

Used when:

PartitionReaderFactory

readerFactory: PartitionReaderFactory

PartitionReaderFactory for partition readers (of the input partitions)

See:

Used when:

  • BatchScanExec physical operator is requested for an input RDD
  • ContinuousScanExec and MicroBatchScanExec physical operators (Spark Structured Streaming) are requested for an inputRDD
  • DataSourceV2ScanExecBase physical operator is requested to outputPartitioning or supportsColumnar

Scan

scan: Scan

Scan

Implementations

Executing Physical Operator

doExecute(): RDD[InternalRow]

doExecute is part of the SparkPlan abstraction.


doExecute...FIXME

doExecuteColumnar

doExecuteColumnar(): RDD[ColumnarBatch]

doExecuteColumnar is part of the SparkPlan abstraction.


doExecuteColumnar...FIXME

Performance Metrics

metrics: Map[String, SQLMetric]

metrics is part of the SparkPlan abstraction.


metrics is the following SQLMetrics with the customMetrics:

Metric Name web UI
numOutputRows number of output rows

Output Data Partitioning Requirements

outputPartitioning: physical.Partitioning

outputPartitioning is part of the SparkPlan abstraction.


outputPartitioning...FIXME

Simple Node Description

simpleString(
    maxFields: Int): String

simpleString is part of the TreeNode abstraction.


simpleString...FIXME

supportsColumnar

supportsColumnar: Boolean

supportsColumnar is part of the SparkPlan abstraction.


supportsColumnar is true if the PartitionReaderFactory can supportColumnarReads for all the input partitions. Otherwise, supportsColumnar is false.


supportsColumnar makes sure that either all the input partitions are supportColumnarReads or none, or throws an IllegalArgumentException:

Cannot mix row-based and columnar input partitions.

Custom Metrics

customMetrics: Map[String, SQLMetric]
Lazy Value

customMetrics is a Scala lazy value to guarantee that the code to initialize it is executed once only (when accessed for the first time) and the computed value never changes afterwards.

Learn more in the Scala Language Specification.

customMetrics requests the Scan for supportedCustomMetrics that are then converted to SQLMetrics.


customMetrics is used when:

  • DataSourceV2ScanExecBase is requested for the performance metrics
  • BatchScanExec is requested for the inputRDD
  • ContinuousScanExec is requested for the inputRDD
  • MicroBatchScanExec is requested for the inputRDD (that creates a DataSourceRDD)

verboseStringWithOperatorId

Signature
verboseStringWithOperatorId(): String

verboseStringWithOperatorId is part of the QueryPlan abstraction.

verboseStringWithOperatorId requests the Scan for one of the following (metaDataStr):

In the end, verboseStringWithOperatorId is as follows (based on formattedNodeName and output):

[formattedNodeName]
Output: [output]
[metaDataStr]