DataSourceV2ScanExecBase Leaf Physical Operators¶
DataSourceV2ScanExecBase
is an extension of LeafExecNode abstraction for leaf physical operators that track number of output rows when executed (with or without support for columnar reads).
Contract¶
Input Partitions¶
partitions: Seq[InputPartition]
See:
Used when:
DataSourceV2ScanExecBase
is requested for the partitions, groupedPartitions, supportsColumnar
Input RDD¶
inputRDD: RDD[InternalRow]
See:
keyGroupedPartitioning¶
keyGroupedPartitioning: Option[Seq[Expression]]
PartitionReaderFactory¶
readerFactory: PartitionReaderFactory
PartitionReaderFactory for partition readers (of the input partitions)
See:
Used when:
BatchScanExec
physical operator is requested for an input RDDContinuousScanExec
andMicroBatchScanExec
physical operators (Spark Structured Streaming) are requested for aninputRDD
DataSourceV2ScanExecBase
physical operator is requested to outputPartitioning or supportsColumnar
Scan¶
scan: Scan
Implementations¶
- BatchScanExec
ContinuousScanExec
(Spark Structured Streaming)MicroBatchScanExec
(Spark Structured Streaming)
Executing Physical Operator¶
doExecute(): RDD[InternalRow]
doExecute
is part of the SparkPlan abstraction.
doExecute
...FIXME
doExecuteColumnar¶
doExecuteColumnar(): RDD[ColumnarBatch]
doExecuteColumnar
is part of the SparkPlan abstraction.
doExecuteColumnar
...FIXME
Performance Metrics¶
metrics: Map[String, SQLMetric]
metrics
is part of the SparkPlan abstraction.
metrics
is the following SQLMetrics with the customMetrics:
Metric Name | web UI |
---|---|
numOutputRows | number of output rows |
Output Data Partitioning Requirements¶
outputPartitioning: physical.Partitioning
outputPartitioning
is part of the SparkPlan abstraction.
outputPartitioning
...FIXME
Simple Node Description¶
simpleString(
maxFields: Int): String
simpleString
is part of the TreeNode abstraction.
simpleString
...FIXME
supportsColumnar¶
supportsColumnar: Boolean
supportsColumnar
is part of the SparkPlan abstraction.
supportsColumnar
is true
if the PartitionReaderFactory can supportColumnarReads for all the input partitions. Otherwise, supportsColumnar
is false
.
supportsColumnar
makes sure that either all the input partitions are supportColumnarReads or none, or throws an IllegalArgumentException
:
Cannot mix row-based and columnar input partitions.
Custom Metrics¶
customMetrics: Map[String, SQLMetric]
Lazy Value
customMetrics
is a Scala lazy value to guarantee that the code to initialize it is executed once only (when accessed for the first time) and the computed value never changes afterwards.
Learn more in the Scala Language Specification.
customMetrics
requests the Scan for supportedCustomMetrics that are then converted to SQLMetrics.
customMetrics
is used when:
DataSourceV2ScanExecBase
is requested for the performance metricsBatchScanExec
is requested for the inputRDDContinuousScanExec
is requested for theinputRDD
MicroBatchScanExec
is requested for theinputRDD
(that creates a DataSourceRDD)
verboseStringWithOperatorId¶
Signature
verboseStringWithOperatorId(): String
verboseStringWithOperatorId
is part of the QueryPlan abstraction.
verboseStringWithOperatorId
requests the Scan for one of the following (metaDataStr
):
- Metadata when SupportsMetadata
- Description, otherwise
In the end, verboseStringWithOperatorId
is as follows (based on formattedNodeName and output):
[formattedNodeName]
Output: [output]
[metaDataStr]