ArrowEvalPythonExec Physical Operator¶

ArrowEvalPythonExec is an EvalPythonExec physical operator to evaluate scalar PythonUDFs using ArrowPythonRunner.

ArrowEvalPythonExec represents ArrowEvalPython logical operator at execution time.

Creating Instance¶

ArrowEvalPythonExec takes the following to be created:

Scalar PythonUDFs
Result Attributes (Spark SQL)
Child SparkPlan (Spark SQL)
Eval Type

ArrowEvalPythonExec is created when:

PythonEvals physical execution strategy is executed (and plans ArrowEvalPython logical operators)

Performance Metrics¶

ArrowEvalPythonExec is a PythonSQLMetrics.

Maximum Records per Batch¶

batchSize is the value of spark.sql.execution.arrow.maxRecordsPerBatch configuration property.

batchSize is used while evaluating PythonUDFs.

Evaluating PythonUDFs¶

EvalPythonExec

evaluate(
  funcs: Seq[ChainedPythonFunctions],
  argOffsets: Array[Array[Int]],
  iter: Iterator[InternalRow],
  schema: StructType,
  context: TaskContext): Iterator[InternalRow]

evaluate is part of the EvalPythonExec abstraction.

evaluate creates an ArrowPythonRunner to compute partitions.

In the end, evaluate converts ColumnarBatches into InternalRows.