Expression¶
Expression is an extension of the TreeNode abstraction for executable expressions (in the Catalyst Tree Manipulation Framework).
abstract class Expression
extends TreeNode[Expression]
Expression is an executable TreeNode that can be evaluated and produce a JVM object (for an InternalRow) in the faster code-generated or the slower interpreted modes.
Contract¶
Evaluation Result DataType¶
dataType: DataType
The DataType of the result of evaluating this expression
Generating Java Source Code for Code-Generated Expression Evaluation¶
doGenCode(
ctx: CodegenContext,
ev: ExprCode): ExprCode
Generates a Java source code for Whole-Stage Java Code Generation execution
See ScalaUDF
Used when:
Expressionis requested to generate a Java code
Interpreted Expression Evaluation¶
eval(
input: InternalRow = null): Any
Interpreted expression evaluation that evaluates this expression to a JVM object for a given InternalRow (and skipping generating a corresponding Java code)
eval is a slower "relative" of the code-generated expression evaluation
nullable¶
nullable: Boolean
Implementations¶
BinaryExpression¶
LeafExpression¶
TernaryExpression¶
Other Expressions¶
- CodegenFallback
- ExpectsInputTypes
- NamedExpression
- Nondeterministic
- Predicate
- UnaryExpression
- Unevaluable
- others
Code-Generated Expression Evaluation¶
genCode(
ctx: CodegenContext): ExprCode
genCode returns an ExprCode with a Java source code for code-generated expression evaluation.
genCode is doGenCode but does Subexpression Elimination.
genCode is a faster "relative" of the interpreted expression evaluation.
genCode is used when:
CodegenContextis requested to subexpressionEliminationForWholeStageCodegen and generateExpressions (with subexpressionElimination)GenerateSafeProjectionutility is used to create a Projection- others
reduceCodeSize¶
reduceCodeSize(
ctx: CodegenContext,
eval: ExprCode): Unit
reduceCodeSize does its work only when all of the following are met:
-
Length of the generated code is above spark.sql.codegen.methodSplitThreshold
-
INPUT_ROW (of the input
CodegenContext) is defined -
currentVars (of the input
CodegenContext) is not defined
This needs your help
FIXME When would the above not be met? What's so special about such an expression?
reduceCodeSize sets the value of the input ExprCode to the fresh term name for the value name.
In the end, reduceCodeSize sets the code of the input ExprCode to the following:
[javaType] [newValue] = [funcFullName]([INPUT_ROW]);
The funcFullName is the fresh term name for the name of the current expression node.
deterministic¶
Expression is deterministic when evaluates to the same result for the same input(s). An expression is deterministic if all the child expressions are.
Note
A deterministic expression is like a pure function in functional programming languages.
val e = $"a".expr
import org.apache.spark.sql.catalyst.expressions.Expression
assert(e.isInstanceOf[Expression])
assert(e.deterministic)
foldable¶
foldable: Boolean
foldable is false (and is expected to be overriden by implementations).
foldable expression is a candidate for static evaluation before the query is executed.
See:
Demo¶
// evaluating an expression
// Use Literal expression to create an expression from a Scala object
import org.apache.spark.sql.catalyst.expressions.{Expression, Literal}
val e: Expression = Literal("hello")
import org.apache.spark.sql.catalyst.expressions.EmptyRow
val v: Any = e.eval(EmptyRow)
// Convert to Scala's String
import org.apache.spark.unsafe.types.UTF8String
val s = v.asInstanceOf[UTF8String].toString
assert(s == "hello")