Expression¶
Expression
is an extension of the TreeNode abstraction for executable expressions (in the Catalyst Tree Manipulation Framework).
abstract class Expression
extends TreeNode[Expression]
Expression
is an executable TreeNode that can be evaluated and produce a JVM object (for an InternalRow) in the faster code-generated or the slower interpreted modes.
Contract¶
Evaluation Result DataType¶
dataType: DataType
The DataType of the result of evaluating this expression
Generating Java Source Code for Code-Generated Expression Evaluation¶
doGenCode(
ctx: CodegenContext,
ev: ExprCode): ExprCode
Generates a Java source code for Whole-Stage Java Code Generation execution
See ScalaUDF
Used when:
Expression
is requested to generate a Java code
Interpreted Expression Evaluation¶
eval(
input: InternalRow = null): Any
Interpreted expression evaluation that evaluates this expression to a JVM object for a given InternalRow (and skipping generating a corresponding Java code)
eval
is a slower "relative" of the code-generated expression evaluation
nullable¶
nullable: Boolean
Implementations¶
BinaryExpression¶
LeafExpression¶
TernaryExpression¶
Other Expressions¶
- CodegenFallback
- ExpectsInputTypes
- NamedExpression
- Nondeterministic
- Predicate
- UnaryExpression
- Unevaluable
- others
Code-Generated Expression Evaluation¶
genCode(
ctx: CodegenContext): ExprCode
genCode
returns an ExprCode
with a Java source code for code-generated expression evaluation.
genCode
is doGenCode but does Subexpression Elimination.
genCode
is a faster "relative" of the interpreted expression evaluation.
genCode
is used when:
CodegenContext
is requested to subexpressionEliminationForWholeStageCodegen and generateExpressions (with subexpressionElimination)GenerateSafeProjection
utility is used to create a Projection- others
reduceCodeSize¶
reduceCodeSize(
ctx: CodegenContext,
eval: ExprCode): Unit
reduceCodeSize
does its work only when all of the following are met:
-
Length of the generated code is above spark.sql.codegen.methodSplitThreshold
-
INPUT_ROW (of the input
CodegenContext
) is defined -
currentVars (of the input
CodegenContext
) is not defined
This needs your help
FIXME When would the above not be met? What's so special about such an expression?
reduceCodeSize
sets the value
of the input ExprCode
to the fresh term name for the value
name.
In the end, reduceCodeSize
sets the code of the input ExprCode
to the following:
[javaType] [newValue] = [funcFullName]([INPUT_ROW]);
The funcFullName
is the fresh term name for the name of the current expression node.
deterministic¶
Expression
is deterministic when evaluates to the same result for the same input(s). An expression is deterministic if all the child expressions are.
Note
A deterministic expression is like a pure function in functional programming languages.
val e = $"a".expr
import org.apache.spark.sql.catalyst.expressions.Expression
assert(e.isInstanceOf[Expression])
assert(e.deterministic)
foldable¶
foldable: Boolean
foldable
is false
(and is expected to be overriden by implementations).
foldable
expression is a candidate for static evaluation before the query is executed.
See:
Demo¶
// evaluating an expression
// Use Literal expression to create an expression from a Scala object
import org.apache.spark.sql.catalyst.expressions.{Expression, Literal}
val e: Expression = Literal("hello")
import org.apache.spark.sql.catalyst.expressions.EmptyRow
val v: Any = e.eval(EmptyRow)
// Convert to Scala's String
import org.apache.spark.unsafe.types.UTF8String
val s = v.asInstanceOf[UTF8String].toString
assert(s == "hello")