Skip to content

Expression

Expression is an extension of the TreeNode abstraction for executable expressions (in the Catalyst Tree Manipulation Framework).

abstract class Expression
extends TreeNode[Expression]

Expression is an executable TreeNode that can be evaluated and produce a JVM object (for an InternalRow) in the faster code-generated or the slower interpreted modes.

Contract

Evaluation Result DataType

dataType: DataType

The DataType of the result of evaluating this expression

Generating Java Source Code for Code-Generated Expression Evaluation

doGenCode(
  ctx: CodegenContext,
  ev: ExprCode): ExprCode

Generates a Java source code for Whole-Stage Java Code Generation execution

See ScalaUDF

Used when:

Interpreted Expression Evaluation

eval(
  input: InternalRow = null): Any

Interpreted expression evaluation that evaluates this expression to a JVM object for a given InternalRow (and skipping generating a corresponding Java code)

eval is a slower "relative" of the code-generated expression evaluation

nullable

nullable: Boolean

Implementations

BinaryExpression

LeafExpression

TernaryExpression

Other Expressions

Code-Generated Expression Evaluation

genCode(
  ctx: CodegenContext): ExprCode

genCode returns an ExprCode with a Java source code for code-generated expression evaluation.

genCode is doGenCode but does Subexpression Elimination.

genCode is a faster "relative" of the interpreted expression evaluation.


genCode is used when:

reduceCodeSize

reduceCodeSize(
  ctx: CodegenContext,
  eval: ExprCode): Unit

reduceCodeSize does its work only when all of the following are met:

  1. Length of the generated code is above spark.sql.codegen.methodSplitThreshold

  2. INPUT_ROW (of the input CodegenContext) is defined

  3. currentVars (of the input CodegenContext) is not defined

This needs your help

FIXME When would the above not be met? What's so special about such an expression?

reduceCodeSize sets the value of the input ExprCode to the fresh term name for the value name.

In the end, reduceCodeSize sets the code of the input ExprCode to the following:

[javaType] [newValue] = [funcFullName]([INPUT_ROW]);

The funcFullName is the fresh term name for the name of the current expression node.

deterministic

Expression is deterministic when evaluates to the same result for the same input(s). An expression is deterministic if all the child expressions are.

Note

A deterministic expression is like a pure function in functional programming languages.

val e = $"a".expr

import org.apache.spark.sql.catalyst.expressions.Expression
assert(e.isInstanceOf[Expression])
assert(e.deterministic)

foldable

foldable: Boolean

foldable is false (and is expected to be overriden by implementations).

foldable expression is a candidate for static evaluation before the query is executed.

See:

Demo

// evaluating an expression
// Use Literal expression to create an expression from a Scala object
import org.apache.spark.sql.catalyst.expressions.{Expression, Literal}
val e: Expression = Literal("hello")

import org.apache.spark.sql.catalyst.expressions.EmptyRow
val v: Any = e.eval(EmptyRow)

// Convert to Scala's String
import org.apache.spark.unsafe.types.UTF8String
val s = v.asInstanceOf[UTF8String].toString
assert(s == "hello")