ColumnarToRowExec Physical Operator¶
ColumnarToRowExec
is a ColumnarToRowTransition unary physical operator to translate an RDD of ColumnarBatches into an RDD of InternalRows in Columnar Processing.
ColumnarToRowExec
supports Whole-Stage Java Code Generation.
ColumnarToRowExec
requires that the child physical operator supports columnar processing.
Creating Instance¶
ColumnarToRowExec
takes the following to be created:
- Child physical operator
ColumnarToRowExec
is created when:
- ApplyColumnarRulesAndInsertTransitions physical optimization is executed
Performance Metrics¶
number of input batches¶
Number of input ColumnarBatches across all partitions (from columnar execution of the child physical operator that produces RDD[ColumnarBatch]
and hence RDD partitions with rows "compressed" into ColumnarBatch
es)
The number of input ColumnarBatches is influenced by spark.sql.parquet.columnarReaderBatchSize configuration property.
number of output rows¶
Total of the number of rows in every ColumnarBatch across all partitions (of executeColumnar of the child physical operator)
Executing Physical Operator¶
doExecute
requests the child physical operator to executeColumnar (which is valid since it does support columnar processing) and RDD.mapPartitionsInternal
over partitions of ColumnarBatches (Iterator[ColumnarBatch]
) to "unpack" / "uncompress" them to InternalRows.
While "unpacking", doExecute
updates the number of input batches and number of output rows performance metrics.
Input RDDs¶
inputRDDs
is the RDD of ColumnarBatches (RDD[ColumnarBatch]
) from the child physical operator (when requested to executeColumnar).
canCheckLimitNotReached Flag¶
Signature
canCheckLimitNotReached: Boolean
canCheckLimitNotReached
is part of the CodegenSupport abstraction.
canCheckLimitNotReached
is always enabled (true
).