Expression¶
Expression
is an extension of the TreeNode abstraction for executable expressions (in the Catalyst Tree Manipulation Framework).
abstract class Expression extends TreeNode[Expression]
Expression
is an executable TreeNode that can be evaluated and produce a JVM object (for an InternalRow) in the faster code-generated or the slower interpreted modes.
Contract¶
dataType¶
dataType: DataType
The DataType of the result of evaluating this expression
doGenCode¶
doGenCode(
ctx: CodegenContext,
ev: ExprCode): ExprCode
Code-Generated Expression Evaluation that generates a Java source code (that is used to evaluate the expression in a more optimized way and skipping eval).
Used when:
Expression
is requested to generate a Java code
Interpreted Expression Evaluation¶
eval(
input: InternalRow = null): Any
Interpreted expression evaluation that evaluates this expression to a JVM object for a given InternalRow (and skipping generating a corresponding Java code)
eval
is a slower "relative" of the code-generated expression evaluation
nullable¶
nullable: Boolean
Implementations¶
BinaryExpression¶
LeafExpression¶
TernaryExpression¶
Other Expressions¶
- CodegenFallback
- ExpectsInputTypes
- NamedExpression
- Nondeterministic
- Predicate
- UnaryExpression
- Unevaluable
- many others
Code-Generated Expression Evaluation¶
genCode(
ctx: CodegenContext): ExprCode
genCode
returns a ExprCode
with a Java source code for expression evaluation (on an input InternalRow).
Similar to doGenCode but supports expression reuse using Subexpression Elimination.
genCode
is a faster "relative" of the interpreted expression evaluation.
genCode
is used when:
CodegenContext
is requested to subexpressionEliminationForWholeStageCodegen, subexpressionElimination and generateExpressionsGenerateSafeProjection
utility is used to create a Projection- others
reduceCodeSize¶
reduceCodeSize(
ctx: CodegenContext,
eval: ExprCode): Unit
reduceCodeSize
does its work only when all of the following are met:
-
Length of the generated code is above spark.sql.codegen.methodSplitThreshold
-
INPUT_ROW (of the input
CodegenContext
) is defined -
currentVars (of the input
CodegenContext
) is not defined
This needs your help
FIXME When would the above not be met? What's so special about such an expression?
reduceCodeSize
sets the value
of the input ExprCode
to the fresh term name for the value
name.
In the end, reduceCodeSize
sets the code of the input ExprCode
to the following:
[javaType] [newValue] = [funcFullName]([INPUT_ROW]);
The funcFullName
is the fresh term name for the name of the current expression node.
deterministic Flag¶
Expression
is deterministic when evaluates to the same result for the same input(s). An expression is deterministic if all the child expressions are.
Note
A deterministic expression is like a pure function in functional programming languages.
val e = $"a".expr
import org.apache.spark.sql.catalyst.expressions.Expression
assert(e.isInstanceOf[Expression])
assert(e.deterministic)
Demo¶
// evaluating an expression
// Use Literal expression to create an expression from a Scala object
import org.apache.spark.sql.catalyst.expressions.{Expression, Literal}
val e: Expression = Literal("hello")
import org.apache.spark.sql.catalyst.expressions.EmptyRow
val v: Any = e.eval(EmptyRow)
// Convert to Scala's String
import org.apache.spark.unsafe.types.UTF8String
val s = v.asInstanceOf[UTF8String].toString
assert(s == "hello")