Skip to content

CodegenContext

CodegenContext is a context for Whole-Stage Java Code Generation to track objects (that could be passed into generated Java code).

Creating Instance

CodegenContext takes no arguments to be created.

CodegenContext is created when:

newCodeGenContext

newCodeGenContext(): CodegenContext

newCodeGenContext creates a new CodegenContext.

newCodeGenContext is used when:

references

references: mutable.ArrayBuffer[Any]

CodegenContext uses references collection for objects that could be passed into generated class.

A new reference is added:

Used when:

addReferenceObj

addReferenceObj(
  objName: String,
  obj: Any,
  className: String = null): String

addReferenceObj adds the given obj to the references registry and returns the following code text:

(([clsName]) references[[idx]] /* [objName] */)

addReferenceObj is used when:

  • AvroDataToCatalyst is requested to doGenCode
  • CatalystDataToAvro is requested to doGenCode
  • CastBase is requested to castToStringCode, castToDateCode, castToTimestampCode and castToTimestampNTZCode
  • Catalyst Expressions are requested to doGenCode
  • BroadcastHashJoinExec physical operator is requested to prepareBroadcast
  • BroadcastNestedLoopJoinExec physical operator is requested to prepareBroadcast
  • ShuffledHashJoinExec physical operator is requested to prepareRelation
  • SortExec physical operator is requested to doProduce
  • SortMergeJoinExec physical operator is requested to doProduce
  • HashAggregateExec physical operator is requested to doProduceWithKeys
  • CodegenSupport is requested to metricTerm
  • HashMapGenerator is requested to initializeAggregateHashMap

generateExpressions

generateExpressions(
  expressions: Seq[Expression],
  doSubexpressionElimination: Boolean = false): Seq[ExprCode]

generateExpressions generates a Java source code for Code-Generated Evaluation of multiple Catalyst Expressions (with optional subexpression elimination).


With the given doSubexpressionElimination enabled, generateExpressions subexpressionElimination (with the given expressions).

In the end, generateExpressions requests every Expression (in the given expressions) for a Java source code for code-generated (non-interpreted) expression evaluation.


generateExpressions is used when:

subexpressionElimination

subexpressionElimination(
  expressions: Seq[Expression]): Unit

subexpressionElimination...FIXME

subexpressionEliminationForWholeStageCodegen

subexpressionEliminationForWholeStageCodegen(
  expressions: Seq[Expression]): SubExprCodes

subexpressionEliminationForWholeStageCodegen...FIXME


subexpressionEliminationForWholeStageCodegen is used when:

Demo

Adding State

import org.apache.spark.sql.catalyst.expressions.codegen._
val ctx = new CodegenContext

val input = ctx.addMutableState(
  "scala.collection.Iterator",
  "input",
  v => s"$v = inputs[0];")

CodegenContext.subexpressionElimination

import org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext
val ctx = new CodegenContext

// Use Catalyst DSL
import org.apache.spark.sql.catalyst.dsl.expressions._
val expressions = "hello".expr.as("world") :: "hello".expr.as("world") :: Nil

// FIXME Use a real-life query to extract the expressions

// CodegenContext.subexpressionElimination (where the elimination all happens) is a private method
// It is used exclusively in CodegenContext.generateExpressions which is public
// and does the elimination when it is enabled

// Note the doSubexpressionElimination flag is on
// Triggers the subexpressionElimination private method
ctx.generateExpressions(expressions, doSubexpressionElimination = true)

// subexpressionElimination private method uses ctx.equivalentExpressions
val commonExprs = ctx.equivalentExpressions.getAllEquivalentExprs

assert(commonExprs.length > 0, "No common expressions found")