DataSourceV2ScanExec Leaf Physical Operator


As of this commit DataSourceV2ScanExec is no longer available in Spark 3.0.0 and the page will soon be removed (once DataSourceV2ScanExecBase takes over).

DataSourceV2ScanExec is a leaf physical operator that represents a DataSourceV2Relation logical operator at execution time.

DataSourceV2ScanExec supports ColumnarBatchScan with vectorized batch decoding..

[[inputRDDs]] DataSourceV2ScanExec gives the single <> as the only input RDD of internal rows (when WholeStageCodegenExec physical operator is[executed]).

Creating Instance

DataSourceV2ScanExec takes the following to be created:

  • [[output]] Output schema (as a collection of AttributeReferences)
  • [[reader]] FIXME

DataSourceV2ScanExec is <> exclusively when DataSourceV2Strategy execution planning strategy is executed (i.e. applied to a logical plan) and finds a <> logical operator.

=== [[doExecute]] Executing Physical Operator (Generating RDD[InternalRow]) -- doExecute Method

[source, scala]

doExecute(): RDD[InternalRow]


doExecute is part of the SparkPlan abstraction.

=== [[internal-properties]] Internal Properties

[cols="30m,70",options="header",width="100%"] |=== | Name | Description

| batchPartitions a| [[batchPartitions]] Input partitions of ColumnarBatches (Seq[InputPartition[ColumnarBatch]])

| partitions a| [[partitions]] Input partitions of InternalRows (Seq[InputPartition[InternalRow]])