Skip to content

DataSourceV2ScanExec Leaf Physical Operator


As of this commit DataSourceV2ScanExec is no longer available in Spark 3.0.0 and the page will soon be removed (once DataSourceV2ScanExecBase takes over).

DataSourceV2ScanExec is a leaf physical operator that represents a DataSourceV2Relation logical operator at execution time.

DataSourceV2ScanExec supports ColumnarBatchScan with vectorized batch decoding..

[[inputRDDs]] DataSourceV2ScanExec gives the single <> as the only input RDD of internal rows (when WholeStageCodegenExec physical operator is[executed]).

Creating Instance

DataSourceV2ScanExec takes the following to be created:

  • [[output]] Output schema (as a collection of AttributeReferences)
  • [[reader]] FIXME

DataSourceV2ScanExec is <> exclusively when DataSourceV2Strategy execution planning strategy is executed (i.e. applied to a logical plan) and finds a <> logical operator.

=== [[doExecute]] Executing Physical Operator (Generating RDD[InternalRow]) -- doExecute Method

[source, scala]

doExecute(): RDD[InternalRow]


doExecute is part of the SparkPlan abstraction.

=== [[internal-properties]] Internal Properties

[cols="30m,70",options="header",width="100%"] |=== | Name | Description

| batchPartitions a| [[batchPartitions]] Input partitions of ColumnarBatches (Seq[InputPartition[ColumnarBatch]])

| partitions a| [[partitions]] Input partitions of InternalRows (Seq[InputPartition[InternalRow]])