Skip to content

Whole-Stage Java Code Generation

Whole-Stage Java Code Generation (Whole-Stage CodeGen) is a physical query optimization in Spark SQL that fuses multiple physical operators (as a subtree of plans that support code generation) together into a single Java function.

Whole-Stage Java Code Generation improves the execution performance of a query by collapsing a query tree into a single optimized function that eliminates virtual function calls and leverages CPU registers for intermediate data.

Note

Whole-Stage Code Generation is used by some modern massively parallel processing (MPP) databases to achieve a better query execution performance.

See Efficiently Compiling Efficient Query Plans for Modern Hardware (PDF).

Columnar Execution

The Whole-Stage Code Generation framework is row-based.

The input RDDs of the physical operators of a whole-stage pipeline are all RDD[InternalRow]s. The output RDD of a whole-stage pipeline is also an RDD[InternalRow]. However, the input to a whole-stage code gen stage can be columnar (RDD[ColumnarBatch]).

If a physical operator supports columnar execution, it can't at the same time support whole-stage-codegen.

CodegenSupport Physical Operators

Physical operators that support code generation extend CodegenSupport (and keep supportCodegen flag enabled).

ObjectType

Whole-Stage Java Code Generation does not support (skips) physical operators that produce a domain object (the DataType of the output expression is ObjectType) as domain objects cannot be written into an UnsafeRow.

AggregateCodegenSupport

For aggregation, Whole-Stage Code Generation is supported by AggregateCodegenSupport physical operators (HashAggregateExec and SortAggregateExec) and only when there are no ImperativeAggregates (supportCodegen).

In other words, Whole-Stage Code Generation will only be used for aggreation for DeclarativeAggregate and TypedAggregateExpression expressions.

Fast Driver-Local Collect/Take Paths

The following physical operators cannot be a root of WholeStageCodegen to support the fast driver-local collect/take paths:

WholeStageCodegenExec Physical Operator

WholeStageCodegenExec physical operator

Janino

Janino is used to compile a Java source code into a Java class at runtime.

Debugging Query Execution

Debugging Query Execution facility allows deep dive into the whole-stage code generation.

val q = spark.range(10).where('id === 4)
scala> q.queryExecution.debug.codegen
Found 1 WholeStageCodegen subtrees.
== Subtree 1 / 1 ==
*(1) Filter (id#3L = 4)
+- *(1) Range (0, 10, step=1, splits=8)

Generated code:
/* 001 */ public Object generate(Object[] references) {
/* 002 */   return new GeneratedIteratorForCodegenStage1(references);
/* 003 */ }
/* 004 */
/* 005 */ final class GeneratedIteratorForCodegenStage1 extends org.apache.spark.sql.execution.BufferedRowIterator {
...
val q = spark.range(10).where('id === 4)
import org.apache.spark.sql.execution.debug._
scala> q.debugCodegen()
Found 1 WholeStageCodegen subtrees.
== Subtree 1 / 1 ==
*(1) Filter (id#0L = 4)
+- *(1) Range (0, 10, step=1, splits=8)

Generated code:
/* 001 */ public Object generate(Object[] references) {
/* 002 */   return new GeneratedIteratorForCodegenStage1(references);
/* 003 */ }
/* 004 */
/* 005 */ final class GeneratedIteratorForCodegenStage1 extends org.apache.spark.sql.execution.BufferedRowIterator {
...

CollapseCodegenStages Physical Preparation Rule

At query execution planning, CollapseCodegenStages physical preparation rule finds the physical query plans that support codegen and collapses them together as a WholeStageCodegen (possibly with InputAdapter in-between for physical operators with no support for Java code generation).

CollapseCodegenStages is part of the sequence of physical preparation rules QueryExecution.preparations that will be applied in order to the physical plan before execution.

debugCodegen

debugCodegen or QueryExecution.debug.codegen methods allow to access the generated Java source code for a structured query.

As of Spark 3.0.0, debugCodegen prints Java bytecode statistics of generated classes (and compiled by Janino).

import org.apache.spark.sql.execution.debug._
val q = "SELECT sum(v) FROM VALUES(1) t(v)"
scala> sql(q).debugCodegen
Found 2 WholeStageCodegen subtrees.
== Subtree 1 / 2 (maxMethodCodeSize:124; maxConstantPoolSize:130(0.20% used); numInnerClasses:0) ==
...
== Subtree 2 / 2 (maxMethodCodeSize:139; maxConstantPoolSize:137(0.21% used); numInnerClasses:0) ==

spark.sql.codegen.wholeStage

Whole-Stage Code Generation is controlled by spark.sql.codegen.wholeStage Spark internal property.

Whole-Stage Code Generation is on by default.

assert(spark.sessionState.conf.wholeStageEnabled)

Code Generation Paths

Code generation paths were coined in this commit.

Tip

Learn more in SPARK-12795 Whole stage codegen.

Non-Whole-Stage-Codegen Path

Produce Path

Whole-stage-codegen "produce" path

A physical operator with CodegenSupport can generate Java source code to process the rows from input RDDs.

Consume Path

Whole-stage-codegen "consume" path

BenchmarkWholeStageCodegen

BenchmarkWholeStageCodegen class provides a benchmark to measure whole stage codegen performance.

You can execute it using the command:

build/sbt 'sql/testOnly *BenchmarkWholeStageCodegen'

Note

You need to un-ignore tests in BenchmarkWholeStageCodegen by replacing ignore with test.

$ build/sbt 'sql/testOnly *BenchmarkWholeStageCodegen'
...
Running benchmark: range/limit/sum
  Running case: range/limit/sum codegen=false
22:55:23.028 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  Running case: range/limit/sum codegen=true

Java HotSpot(TM) 64-Bit Server VM 1.8.0_77-b03 on Mac OS X 10.10.5
Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz

range/limit/sum:                    Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
-------------------------------------------------------------------------------------------
range/limit/sum codegen=false             376 /  433       1394.5           0.7       1.0X
range/limit/sum codegen=true              332 /  388       1581.3           0.6       1.1X

[info] - range/limit/sum (10 seconds, 74 milliseconds)