Skip to content

ColumnarToRowExec Physical Operator

ColumnarToRowExec is a ColumnarToRowTransition unary physical operator to translate an RDD of ColumnarBatches into an RDD of InternalRows in Columnar Processing.

ColumnarToRowExec supports Whole-Stage Java Code Generation.

ColumnarToRowExec requires that the child physical operator supports columnar processing.

Creating Instance

ColumnarToRowExec takes the following to be created:

ColumnarToRowExec is created when:

Performance Metrics

number of input batches

Number of input ColumnarBatches across all partitions (from columnar execution of the child physical operator that produces RDD[ColumnarBatch] and hence RDD partitions with rows "compressed" into ColumnarBatches)

The number of input ColumnarBatches is influenced by spark.sql.parquet.columnarReaderBatchSize configuration property.

number of output rows

Total of the number of rows in every ColumnarBatch across all partitions (of executeColumnar of the child physical operator)

Executing Physical Operator

Signature
doExecute(): RDD[InternalRow]

doExecute is part of the SparkPlan abstraction.

doExecute requests the child physical operator to executeColumnar (which is valid since it does support columnar processing) and RDD.mapPartitionsInternal over partitions of ColumnarBatches (Iterator[ColumnarBatch]) to "unpack" / "uncompress" them to InternalRows.

While "unpacking", doExecute updates the number of input batches and number of output rows performance metrics.

Input RDDs

Signature
inputRDDs(): Seq[RDD[InternalRow]]

inputRDDs is part of the CodegenSupport abstraction.

inputRDDs is the RDD of ColumnarBatches (RDD[ColumnarBatch]) from the child physical operator (when requested to executeColumnar).

canCheckLimitNotReached Flag

Signature
canCheckLimitNotReached: Boolean

canCheckLimitNotReached is part of the CodegenSupport abstraction.

canCheckLimitNotReached is always enabled (true).