Skip to content


CodegenContext is a context for Whole-Stage Java Code Generation to track objects (that could be passed into generated Java code).

Creating Instance

CodegenContext takes no arguments to be created.

CodegenContext is created when:


newCodeGenContext(): CodegenContext

newCodeGenContext creates a new CodegenContext.

newCodeGenContext is used when:


references: mutable.ArrayBuffer[Any]

CodegenContext uses references collection for objects that could be passed into generated class.

A new reference is added:

Used when:


  objName: String,
  obj: Any,
  className: String = null): String

addReferenceObj adds the given obj to the references registry and returns the following code text:

(([clsName]) references[[idx]] /* [objName] */)

addReferenceObj is used when:

  • AvroDataToCatalyst is requested to doGenCode
  • CatalystDataToAvro is requested to doGenCode
  • CastBase is requested to castToStringCode, castToDateCode, castToTimestampCode and castToTimestampNTZCode
  • Catalyst Expressions are requested to doGenCode
  • BroadcastHashJoinExec physical operator is requested to prepareBroadcast
  • BroadcastNestedLoopJoinExec physical operator is requested to prepareBroadcast
  • ShuffledHashJoinExec physical operator is requested to prepareRelation
  • SortExec physical operator is requested to doProduce
  • SortMergeJoinExec physical operator is requested to doProduce
  • HashAggregateExec physical operator is requested to doProduceWithKeys
  • CodegenSupport is requested to metricTerm
  • HashMapGenerator is requested to initializeAggregateHashMap


  expressions: Seq[Expression],
  doSubexpressionElimination: Boolean = false): Seq[ExprCode]

generateExpressions generates a Java source code for Code-Generated Evaluation of multiple Catalyst Expressions (with optional subexpression elimination).

With the given doSubexpressionElimination enabled, generateExpressions subexpressionElimination (with the given expressions).

In the end, generateExpressions requests every Expression (in the given expressions) for a Java source code for code-generated (non-interpreted) expression evaluation.

generateExpressions is used when:


  expressions: Seq[Expression]): Unit



  expressions: Seq[Expression]): SubExprCodes


subexpressionEliminationForWholeStageCodegen is used when:


Adding State

import org.apache.spark.sql.catalyst.expressions.codegen._
val ctx = new CodegenContext

val input = ctx.addMutableState(
  v => s"$v = inputs[0];")


import org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext
val ctx = new CodegenContext

// Use Catalyst DSL
import org.apache.spark.sql.catalyst.dsl.expressions._
val expressions = "hello""world") :: "hello""world") :: Nil

// FIXME Use a real-life query to extract the expressions

// CodegenContext.subexpressionElimination (where the elimination all happens) is a private method
// It is used exclusively in CodegenContext.generateExpressions which is public
// and does the elimination when it is enabled

// Note the doSubexpressionElimination flag is on
// Triggers the subexpressionElimination private method
ctx.generateExpressions(expressions, doSubexpressionElimination = true)

// subexpressionElimination private method uses ctx.equivalentExpressions
val commonExprs = ctx.equivalentExpressions.getAllEquivalentExprs

assert(commonExprs.length > 0, "No common expressions found")