CodeGenerator¶
CodeGenerator is an abstraction of JVM bytecode generators for expression evaluation.
The Scala definition of this abstract class is as follows:
CodeGenerator[InType <: AnyRef, OutType <: AnyRef]
Contract¶
bind¶
bind(
in: InType,
inputSchema: Seq[Attribute]): InType
Used when:
CodeGeneratoris requested to generate
canonicalize¶
canonicalize(
in: InType): InType
Used when:
CodeGeneratoris requested to generate
create¶
create(
in: InType): OutType
Used when:
CodeGeneratoris requested to generate
Implementations¶
- GenerateColumnAccessor
- GenerateMutableProjection
- GenerateOrdering
- GeneratePredicate
- GenerateSafeProjection
- GenerateUnsafeProjection
GenerateUnsafeRowJoiner
cache¶
cache: Cache[CodeAndComment, (GeneratedClass, ByteCodeStats)]
CodeGenerator creates a cache of generated classes when loaded (as an Scala object).
When requested to look up a non-existent CodeAndComment, cache doCompile, updates CodegenMetrics and prints out the following INFO message to the logs:
Code generated in [timeMs] ms
cache allows for up to spark.sql.codegen.cache.maxEntries pairs.
cache is used when:
CodeGeneratoris requested to compile a Java source code.
Compiling Java Code¶
compile(
code: CodeAndComment): (GeneratedClass, ByteCodeStats)
compile looks the given CodeAndComment up in the cache.
compile is used when:
GenerateMutableProjectionis requested to create a MutableProjectionGenerateOrderingis requested to create a BaseOrderingGeneratePredicateis requested to create a BasePredicateGenerateSafeProjectionis requested to create a ProjectionGenerateUnsafeProjectionis requested to create an UnsafeProjectionGenerateUnsafeRowJoineris requested tocreateanUnsafeRowJoinerWholeStageCodegenExecis requested to doExecuteGenerateColumnAccessoris requested to create a ColumnarIteratordebugutility is used tocodegenStringSeq
generate¶
generate(
expressions: InType): OutType
generate(
expressions: InType,
inputSchema: Seq[Attribute]): OutType // (1)!
- Binds the input expressions to the given input schema
generate creates a class for the input expressions (after canonicalization).
generate is used when:
Serializer(of ExpressionEncoder) is requested toapplyRowOrderingutility is used to createCodeGeneratedObjectSafeProjectionutility is used tocreateCodeGeneratedObjectLazilyGeneratedOrderingis requested forgeneratedOrderingObjectOperatorutility is used todeserializeRowToObjectandserializeObjectToRowComplexTypedAggregateExpressionis requested forinputRowToObjandbufferRowToObjectDefaultCachedBatchSerializeris requested toconvertCachedBatchToInternalRow
Creating CodegenContext¶
newCodeGenContext(): CodegenContext
newCodeGenContext creates a new CodegenContext.
newCodeGenContext is used when:
GenerateMutableProjectionis requested to create a MutableProjectionGenerateOrderingis requested to create a BaseOrderingGeneratePredicateutility is used to create a BasePredicateGenerateSafeProjectionis requested to create a ProjectionGenerateUnsafeProjectionutility is used to create an UnsafeProjectionGenerateColumnAccessoris requested to create a ColumnarIterator
doCompile¶
doCompile(
code: CodeAndComment): (GeneratedClass, ByteCodeStats)
doCompile creates a ClassBodyEvaluator (Janino).
doCompile requests the ClassBodyEvaluator to use org.apache.spark.sql.catalyst.expressions.GeneratedClass as the name of the generated class and sets some default imports (to be included in the generated class).
doCompile requests the ClassBodyEvaluator to use GeneratedClass as a superclass of the generated class (for passing extra references objects into the generated class).
abstract class GeneratedClass {
def generate(references: Array[Any]): Any
}
doCompile prints out the following DEBUG message to the logs (with the given code):
[formatted code]
doCompile requests the ClassBodyEvaluator to cook (read, scan, parse and compile Java tokens) the source code and gets the bytecode statistics:
- max method bytecode size
- max constant pool size
- number of inner classes
doCompile updates CodeGenerator code-gen metrics.
In the end, doCompile returns the GeneratedClass instance and bytecode statistics.
doCompile is used when:
CodeGeneratoris requested to look up a code (in the cache)
Logging¶
CodeGenerator is an abstract class and logging is configured using the logger of the implementations.