ArrowEvalPythonExec Physical Operator¶
ArrowEvalPythonExec
is an EvalPythonExec physical operator to evaluate scalar PythonUDFs using ArrowPythonRunner.
ArrowEvalPythonExec
represents ArrowEvalPython logical operator at execution time.
Creating Instance¶
ArrowEvalPythonExec
takes the following to be created:
- Scalar PythonUDFs
- Result
Attribute
s (Spark SQL) - Child
SparkPlan
(Spark SQL) - Eval Type
ArrowEvalPythonExec
is created when:
PythonEvals
physical execution strategy is executed (and plans ArrowEvalPython logical operators)
Performance Metrics¶
ArrowEvalPythonExec
is a PythonSQLMetrics.
Maximum Records per Batch¶
batchSize
is the value of spark.sql.execution.arrow.maxRecordsPerBatch configuration property.
batchSize
is used while evaluating PythonUDFs.
Evaluating PythonUDFs¶
EvalPythonExec
evaluate(
funcs: Seq[ChainedPythonFunctions],
argOffsets: Array[Array[Int]],
iter: Iterator[InternalRow],
schema: StructType,
context: TaskContext): Iterator[InternalRow]
evaluate
is part of the EvalPythonExec abstraction.
evaluate
creates an ArrowPythonRunner to compute partitions.
In the end, evaluate
converts ColumnarBatch
es into InternalRow
s.