ColumnarToRowExec Physical Operator¶

ColumnarToRowExec is a ColumnarToRowTransition unary physical operator to translate an RDD of ColumnarBatches into an RDD of InternalRows in Columnar Processing.

ColumnarToRowExec supports Whole-Stage Java Code Generation.

ColumnarToRowExec requires that the child physical operator supports columnar processing.

Creating Instance¶

ColumnarToRowExec takes the following to be created:

Child physical operator

ColumnarToRowExec is created when:

ApplyColumnarRulesAndInsertTransitions physical optimization is executed

Performance Metrics¶

number of input batches¶

Number of input ColumnarBatches across all partitions (from columnar execution of the child physical operator that produces RDD[ColumnarBatch] and hence RDD partitions with rows "compressed" into ColumnarBatches)

The number of input ColumnarBatches is influenced by spark.sql.parquet.columnarReaderBatchSize configuration property.

number of output rows¶

Total of the number of rows in every ColumnarBatch across all partitions (of executeColumnar of the child physical operator)

Executing Physical Operator¶

Signature

doExecute(): RDD[InternalRow]

doExecute is part of the SparkPlan abstraction.

doExecute requests the child physical operator to executeColumnar (which is valid since it does support columnar processing) and RDD.mapPartitionsInternal over partitions of ColumnarBatches (Iterator[ColumnarBatch]) to "unpack" / "uncompress" them to InternalRows.

While "unpacking", doExecute updates the number of input batches and number of output rows performance metrics.

Input RDDs¶

Signature

inputRDDs(): Seq[RDD[InternalRow]]

inputRDDs is part of the CodegenSupport abstraction.

inputRDDs is the RDD of ColumnarBatches (RDD[ColumnarBatch]) from the child physical operator (when requested to executeColumnar).

canCheckLimitNotReached Flag¶

Signature

canCheckLimitNotReached: Boolean

canCheckLimitNotReached is part of the CodegenSupport abstraction.

canCheckLimitNotReached is always enabled (true).