CodeGenerator¶
CodeGenerator
is an abstraction of JVM bytecode generators for expression evaluation.
The Scala definition of this abstract class is as follows:
CodeGenerator[InType <: AnyRef, OutType <: AnyRef]
Contract¶
bind¶
bind(
in: InType,
inputSchema: Seq[Attribute]): InType
Used when:
CodeGenerator
is requested to generate
canonicalize¶
canonicalize(
in: InType): InType
Used when:
CodeGenerator
is requested to generate
create¶
create(
in: InType): OutType
Used when:
CodeGenerator
is requested to generate
Implementations¶
- GenerateColumnAccessor
- GenerateMutableProjection
- GenerateOrdering
- GeneratePredicate
- GenerateSafeProjection
- GenerateUnsafeProjection
GenerateUnsafeRowJoiner
cache¶
cache: Cache[CodeAndComment, (GeneratedClass, ByteCodeStats)]
CodeGenerator
creates a cache of generated classes when loaded (as an Scala object).
When requested to look up a non-existent CodeAndComment
, cache
doCompile, updates CodegenMetrics
and prints out the following INFO message to the logs:
Code generated in [timeMs] ms
cache
allows for up to spark.sql.codegen.cache.maxEntries pairs.
cache
is used when:
CodeGenerator
is requested to compile a Java source code.
Compiling Java Code¶
compile(
code: CodeAndComment): (GeneratedClass, ByteCodeStats)
compile
looks the given CodeAndComment
up in the cache.
compile
is used when:
GenerateMutableProjection
is requested to create a MutableProjectionGenerateOrdering
is requested to create a BaseOrderingGeneratePredicate
is requested to create a BasePredicateGenerateSafeProjection
is requested to create a ProjectionGenerateUnsafeProjection
is requested to create an UnsafeProjectionGenerateUnsafeRowJoiner
is requested tocreate
anUnsafeRowJoiner
WholeStageCodegenExec
is requested to doExecuteGenerateColumnAccessor
is requested to create a ColumnarIteratordebug
utility is used tocodegenStringSeq
generate¶
generate(
expressions: InType): OutType
generate(
expressions: InType,
inputSchema: Seq[Attribute]): OutType // (1)!
- Binds the input expressions to the given input schema
generate
creates a class for the input expressions
(after canonicalization).
generate
is used when:
Serializer
(of ExpressionEncoder) is requested toapply
RowOrdering
utility is used to createCodeGeneratedObjectSafeProjection
utility is used tocreateCodeGeneratedObject
LazilyGeneratedOrdering
is requested forgeneratedOrdering
ObjectOperator
utility is used todeserializeRowToObject
andserializeObjectToRow
ComplexTypedAggregateExpression
is requested forinputRowToObj
andbufferRowToObject
DefaultCachedBatchSerializer
is requested toconvertCachedBatchToInternalRow
Creating CodegenContext¶
newCodeGenContext(): CodegenContext
newCodeGenContext
creates a new CodegenContext.
newCodeGenContext
is used when:
GenerateMutableProjection
is requested to create a MutableProjectionGenerateOrdering
is requested to create a BaseOrderingGeneratePredicate
utility is used to create a BasePredicateGenerateSafeProjection
is requested to create a ProjectionGenerateUnsafeProjection
utility is used to create an UnsafeProjectionGenerateColumnAccessor
is requested to create a ColumnarIterator
doCompile¶
doCompile(
code: CodeAndComment): (GeneratedClass, ByteCodeStats)
doCompile
creates a ClassBodyEvaluator
(Janino).
doCompile
requests the ClassBodyEvaluator
to use org.apache.spark.sql.catalyst.expressions.GeneratedClass
as the name of the generated class and sets some default imports (to be included in the generated class).
doCompile
requests the ClassBodyEvaluator
to use GeneratedClass
as a superclass of the generated class (for passing extra references
objects into the generated class).
abstract class GeneratedClass {
def generate(references: Array[Any]): Any
}
doCompile
prints out the following DEBUG message to the logs (with the given code
):
[formatted code]
doCompile
requests the ClassBodyEvaluator
to cook (read, scan, parse and compile Java tokens) the source code and gets the bytecode statistics:
- max method bytecode size
- max constant pool size
- number of inner classes
doCompile
updates CodeGenerator
code-gen metrics.
In the end, doCompile
returns the GeneratedClass
instance and bytecode statistics.
doCompile
is used when:
CodeGenerator
is requested to look up a code (in the cache)
Logging¶
CodeGenerator
is an abstract class and logging is configured using the logger of the implementations.