Skip to content

EvalPythonExec Unary Physical Operators

EvalPythonExec is an extension of the UnaryExecNode (Spark SQL) abstraction for unary physical operators that evaluate PythonUDFs (when executed).

Contract

Evaluating PythonUDFs

evaluate(
  funcs: Seq[ChainedPythonFunctions],
  argOffsets: Array[Array[Int]],
  iter: Iterator[InternalRow],
  schema: StructType,
  context: TaskContext): Iterator[InternalRow]

See:

Used when:

  • EvalPythonExec physical operator is requested to doExecute

Result Attributes

resultAttrs: Seq[Attribute]

Result Attributes (Spark SQL)

See:

Used when:

Python UDFs

udfs: Seq[PythonUDF]

PythonUDFs to evaluate

See:

Used when:

  • EvalPythonExec physical operator is requested to doExecute

Implementations

Executing Physical Operator

SparkPlan
doExecute(): RDD[InternalRow]

doExecute is part of the SparkPlan (Spark SQL) abstraction.

The gist of doExecute is to evaluate Python UDFs (for every InternalRow) with some pre- and post-processing.


doExecute requests the child physical operator to execute (to produce an input RDD[InternalRow]).

Note

EvalPythonExecs are UnaryExecNodes (Spark SQL).

doExecute uses RDD.mapPartitions operator to execute a function over partitions of InternalRows.

For every partition, doExecute creates a MutableProjection for the inputs (and the child's output) and requests it to initialize.

doExecute evaluates Python UDFs (for every InternalRow).

In the end, doExecute creates an UnsafeProjection for the output to "map over" the rows (from evaluating Python UDFs).