ColumnarToRowExec Physical Operator¶
ColumnarToRowExec is a ColumnarToRowTransition unary physical operator to translate an RDD of ColumnarBatches into an RDD of InternalRows in Columnar Processing.
ColumnarToRowExec supports Whole-Stage Java Code Generation.
ColumnarToRowExec requires that the child physical operator supports columnar processing.
ColumnarToRowExec takes the following to be created:
- Child physical operator
ColumnarToRowExec is created when:
- ApplyColumnarRulesAndInsertTransitions physical optimization is executed
number of input batches¶
Number of input ColumnarBatches across all partitions (from columnar execution of the child physical operator that produces
RDD[ColumnarBatch] and hence RDD partitions with rows "compressed" into
The number of input ColumnarBatches is influenced by spark.sql.parquet.columnarReaderBatchSize configuration property.
number of output rows¶
Total of the number of rows in every ColumnarBatch across all partitions (of executeColumnar of the child physical operator)
Executing Physical Operator¶
doExecute is part of the SparkPlan abstraction.
doExecute requests the child physical operator to executeColumnar (which is valid since it does support columnar processing) and
RDD.mapPartitionsInternal over partitions of ColumnarBatches (
Iterator[ColumnarBatch]) to "unpack" / "uncompress" them to InternalRows.
doExecute updates the number of input batches and number of output rows performance metrics.
inputRDDs is part of the CodegenSupport abstraction.
inputRDDs is the RDD of ColumnarBatches (
RDD[ColumnarBatch]) from the child physical operator (when requested to executeColumnar).
canCheckLimitNotReached is part of the CodegenSupport abstraction.
canCheckLimitNotReached is always enabled (