Skip to content


CodeGenerator is an abstraction of JVM bytecode generators for expression evaluation.

The Scala definition of this abstract class is as follows:

CodeGenerator[InType <: AnyRef, OutType <: AnyRef]



  in: InType,
  inputSchema: Seq[Attribute]): InType

Used when:

  • CodeGenerator is requested to generate


  in: InType): InType

Used when:

  • CodeGenerator is requested to generate


  in: InType): OutType

Used when:

  • CodeGenerator is requested to generate



cache: Cache[CodeAndComment, (GeneratedClass, ByteCodeStats)]

CodeGenerator creates a cache of generated classes when loaded (as an Scala object).

When requested to look up a non-existent CodeAndComment, cache doCompile, updates CodegenMetrics and prints out the following INFO message to the logs:

Code generated in [timeMs] ms

cache allows for up to spark.sql.codegen.cache.maxEntries pairs.

cache is used when:

Compiling Java Code

  code: CodeAndComment): (GeneratedClass, ByteCodeStats)

compile looks the given CodeAndComment up in the cache.

compile is used when:


  expressions: InType): OutType
  expressions: InType,
  inputSchema: Seq[Attribute]): OutType // (1)!
  1. Binds the input expressions to the given input schema

generate creates a class for the input expressions (after canonicalization).

generate is used when:

  • Serializer (of ExpressionEncoder) is requested to apply
  • RowOrdering utility is used to createCodeGeneratedObject
  • SafeProjection utility is used to createCodeGeneratedObject
  • LazilyGeneratedOrdering is requested for generatedOrdering
  • ObjectOperator utility is used to deserializeRowToObject and serializeObjectToRow
  • ComplexTypedAggregateExpression is requested for inputRowToObj and bufferRowToObject
  • DefaultCachedBatchSerializer is requested to convertCachedBatchToInternalRow

Creating CodegenContext

newCodeGenContext(): CodegenContext

newCodeGenContext creates a new CodegenContext.

newCodeGenContext is used when:


  code: CodeAndComment): (GeneratedClass, ByteCodeStats)

doCompile creates a ClassBodyEvaluator (Janino).

doCompile requests the ClassBodyEvaluator to use org.apache.spark.sql.catalyst.expressions.GeneratedClass as the name of the generated class and sets some default imports (to be included in the generated class).

doCompile requests the ClassBodyEvaluator to use GeneratedClass as a superclass of the generated class (for passing extra references objects into the generated class).

abstract class GeneratedClass {
  def generate(references: Array[Any]): Any

doCompile prints out the following DEBUG message to the logs (with the given code):

[formatted code]

doCompile requests the ClassBodyEvaluator to cook (read, scan, parse and compile Java tokens) the source code and gets the bytecode statistics:

  • max method bytecode size
  • max constant pool size
  • number of inner classes

doCompile updates CodeGenerator code-gen metrics.

In the end, doCompile returns the GeneratedClass instance and bytecode statistics.

doCompile is used when:

  • CodeGenerator is requested to look up a code (in the cache)


CodeGenerator is an abstract class and logging is configured using the logger of the implementations.