CodegenContext¶
CodegenContext
is a context for Whole-Stage Java Code Generation to track objects (that could be passed into generated Java code).
Creating Instance¶
CodegenContext
takes no arguments to be created.
CodegenContext
is created when:
CodegenContext
is requested for a new CodegenContextGenerateUnsafeRowJoiner
utility is used to create aUnsafeRowJoiner
WholeStageCodegenExec
unary physical operator is requested for a Java source code for the child operator (whenWholeStageCodegenExec
is executed)
newCodeGenContext¶
newCodeGenContext(): CodegenContext
newCodeGenContext
creates a new CodegenContext.
newCodeGenContext
is used when:
GenerateMutableProjection
utility is used to create a MutableProjectionGenerateOrdering
utility is used to create a BaseOrderingGeneratePredicate
utility is used to create a BaseOrderingGenerateSafeProjection
utility is used to create a ProjectionGenerateUnsafeProjection
utility is used to create an UnsafeProjectionGenerateColumnAccessor
utility is used to create a ColumnarIterator
references¶
references: mutable.ArrayBuffer[Any]
CodegenContext
uses references
collection for objects that could be passed into generated class.
A new reference is added:
- addReferenceObj
CodegenFallback
is requested to doGenCode
Used when:
WholeStageCodegenExec
unary physical operator is requested to doExecuteGenerateMutableProjection
utility is used to create a MutableProjectionGenerateOrdering
utility is used to create a BaseOrderingGeneratePredicate
utility is used to create a BaseOrderingGenerateSafeProjection
utility is used to create a ProjectionGenerateUnsafeProjection
utility is used to create an UnsafeProjection
addReferenceObj¶
addReferenceObj(
objName: String,
obj: Any,
className: String = null): String
addReferenceObj
adds the given obj
to the references registry and returns the following code text:
(([clsName]) references[[idx]] /* [objName] */)
addReferenceObj
is used when:
AvroDataToCatalyst
is requested to doGenCodeCatalystDataToAvro
is requested to doGenCodeCastBase
is requested tocastToStringCode
,castToDateCode
,castToTimestampCode
andcastToTimestampNTZCode
- Catalyst
Expression
s are requested to doGenCode BroadcastHashJoinExec
physical operator is requested to prepareBroadcastBroadcastNestedLoopJoinExec
physical operator is requested to prepareBroadcastShuffledHashJoinExec
physical operator is requested to prepareRelationSortExec
physical operator is requested to doProduceSortMergeJoinExec
physical operator is requested to doProduceHashAggregateExec
physical operator is requested to doProduceWithKeysCodegenSupport
is requested to metricTermHashMapGenerator
is requested to initializeAggregateHashMap
generateExpressions¶
generateExpressions(
expressions: Seq[Expression],
doSubexpressionElimination: Boolean = false): Seq[ExprCode]
generateExpressions
generates a Java source code for Code-Generated Evaluation of multiple Catalyst Expressions (with optional subexpression elimination).
With the given doSubexpressionElimination
enabled, generateExpressions
subexpressionElimination (with the given expressions
).
In the end, generateExpressions
requests every Expression (in the given expressions
) for a Java source code for code-generated (non-interpreted) expression evaluation.
generateExpressions
is used when:
GenerateMutableProjection
is requested to create a MutableProjectionGeneratePredicate
is requested to createGenerateUnsafeProjection
is requested to createHashAggregateExec
physical operator is requested for a Java source code for whole-stage consume path with grouping keys
subexpressionElimination¶
subexpressionElimination(
expressions: Seq[Expression]): Unit
subexpressionElimination
...FIXME
subexpressionEliminationForWholeStageCodegen¶
subexpressionEliminationForWholeStageCodegen(
expressions: Seq[Expression]): SubExprCodes
subexpressionEliminationForWholeStageCodegen
...FIXME
subexpressionEliminationForWholeStageCodegen
is used when:
ProjectExec
is requested to doConsumeAggregateCodegenSupport
is requested to doConsumeWithoutKeysHashAggregateExec
is requested to doConsumeWithKeys
Demo¶
Adding State¶
import org.apache.spark.sql.catalyst.expressions.codegen._
val ctx = new CodegenContext
val input = ctx.addMutableState(
"scala.collection.Iterator",
"input",
v => s"$v = inputs[0];")
CodegenContext.subexpressionElimination¶
import org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext
val ctx = new CodegenContext
// Use Catalyst DSL
import org.apache.spark.sql.catalyst.dsl.expressions._
val expressions = "hello".expr.as("world") :: "hello".expr.as("world") :: Nil
// FIXME Use a real-life query to extract the expressions
// CodegenContext.subexpressionElimination (where the elimination all happens) is a private method
// It is used exclusively in CodegenContext.generateExpressions which is public
// and does the elimination when it is enabled
// Note the doSubexpressionElimination flag is on
// Triggers the subexpressionElimination private method
ctx.generateExpressions(expressions, doSubexpressionElimination = true)
// subexpressionElimination private method uses ctx.equivalentExpressions
val commonExprs = ctx.equivalentExpressions.getAllEquivalentExprs
assert(commonExprs.length > 0, "No common expressions found")