CodegenContext¶
CodegenContext is a context for Whole-Stage Java Code Generation to track objects (that could be passed into generated Java code).
Creating Instance¶
CodegenContext takes no arguments to be created.
CodegenContext is created when:
CodegenContextis requested for a new CodegenContextGenerateUnsafeRowJoinerutility is used to create aUnsafeRowJoinerWholeStageCodegenExecunary physical operator is requested for a Java source code for the child operator (whenWholeStageCodegenExecis executed)
newCodeGenContext¶
newCodeGenContext(): CodegenContext
newCodeGenContext creates a new CodegenContext.
newCodeGenContext is used when:
GenerateMutableProjectionutility is used to create a MutableProjectionGenerateOrderingutility is used to create a BaseOrderingGeneratePredicateutility is used to create a BaseOrderingGenerateSafeProjectionutility is used to create a ProjectionGenerateUnsafeProjectionutility is used to create an UnsafeProjectionGenerateColumnAccessorutility is used to create a ColumnarIterator
references¶
references: mutable.ArrayBuffer[Any]
CodegenContext uses references collection for objects that could be passed into generated class.
A new reference is added:
- addReferenceObj
CodegenFallbackis requested to doGenCode
Used when:
WholeStageCodegenExecunary physical operator is requested to doExecuteGenerateMutableProjectionutility is used to create a MutableProjectionGenerateOrderingutility is used to create a BaseOrderingGeneratePredicateutility is used to create a BaseOrderingGenerateSafeProjectionutility is used to create a ProjectionGenerateUnsafeProjectionutility is used to create an UnsafeProjection
addReferenceObj¶
addReferenceObj(
objName: String,
obj: Any,
className: String = null): String
addReferenceObj adds the given obj to the references registry and returns the following code text:
(([clsName]) references[[idx]] /* [objName] */)
addReferenceObj is used when:
AvroDataToCatalystis requested to doGenCodeCatalystDataToAvrois requested to doGenCodeCastBaseis requested tocastToStringCode,castToDateCode,castToTimestampCodeandcastToTimestampNTZCode- Catalyst
Expressions are requested to doGenCode BroadcastHashJoinExecphysical operator is requested to prepareBroadcastBroadcastNestedLoopJoinExecphysical operator is requested to prepareBroadcastShuffledHashJoinExecphysical operator is requested to prepareRelationSortExecphysical operator is requested to doProduceSortMergeJoinExecphysical operator is requested to doProduceHashAggregateExecphysical operator is requested to doProduceWithKeysCodegenSupportis requested to metricTermHashMapGeneratoris requested to initializeAggregateHashMap
generateExpressions¶
generateExpressions(
expressions: Seq[Expression],
doSubexpressionElimination: Boolean = false): Seq[ExprCode]
generateExpressions generates a Java source code for Code-Generated Evaluation of multiple Catalyst Expressions (with optional subexpression elimination).
With the given doSubexpressionElimination enabled, generateExpressions subexpressionElimination (with the given expressions).
In the end, generateExpressions requests every Expression (in the given expressions) for a Java source code for code-generated (non-interpreted) expression evaluation.
generateExpressions is used when:
GenerateMutableProjectionis requested to create a MutableProjectionGeneratePredicateis requested to createGenerateUnsafeProjectionis requested to createHashAggregateExecphysical operator is requested for a Java source code for whole-stage consume path with grouping keys
subexpressionElimination¶
subexpressionElimination(
expressions: Seq[Expression]): Unit
subexpressionElimination...FIXME
subexpressionEliminationForWholeStageCodegen¶
subexpressionEliminationForWholeStageCodegen(
expressions: Seq[Expression]): SubExprCodes
subexpressionEliminationForWholeStageCodegen...FIXME
subexpressionEliminationForWholeStageCodegen is used when:
ProjectExecis requested to doConsumeAggregateCodegenSupportis requested to doConsumeWithoutKeysHashAggregateExecis requested to doConsumeWithKeys
Demo¶
Adding State¶
import org.apache.spark.sql.catalyst.expressions.codegen._
val ctx = new CodegenContext
val input = ctx.addMutableState(
"scala.collection.Iterator",
"input",
v => s"$v = inputs[0];")
CodegenContext.subexpressionElimination¶
import org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext
val ctx = new CodegenContext
// Use Catalyst DSL
import org.apache.spark.sql.catalyst.dsl.expressions._
val expressions = "hello".expr.as("world") :: "hello".expr.as("world") :: Nil
// FIXME Use a real-life query to extract the expressions
// CodegenContext.subexpressionElimination (where the elimination all happens) is a private method
// It is used exclusively in CodegenContext.generateExpressions which is public
// and does the elimination when it is enabled
// Note the doSubexpressionElimination flag is on
// Triggers the subexpressionElimination private method
ctx.generateExpressions(expressions, doSubexpressionElimination = true)
// subexpressionElimination private method uses ctx.equivalentExpressions
val commonExprs = ctx.equivalentExpressions.getAllEquivalentExprs
assert(commonExprs.length > 0, "No common expressions found")