ArrowEvalPythonExec Physical Operator¶
ArrowEvalPythonExec is an EvalPythonExec physical operator to evaluate scalar PythonUDFs using ArrowPythonRunner.
ArrowEvalPythonExec represents ArrowEvalPython logical operator at execution time.
Creating Instance¶
ArrowEvalPythonExec takes the following to be created:
- Scalar PythonUDFs
- Result
Attributes (Spark SQL) - Child
SparkPlan(Spark SQL) - Eval Type
ArrowEvalPythonExec is created when:
PythonEvalsphysical execution strategy is executed (and plans ArrowEvalPython logical operators)
Performance Metrics¶
ArrowEvalPythonExec is a PythonSQLMetrics.
Maximum Records per Batch¶
batchSize is the value of spark.sql.execution.arrow.maxRecordsPerBatch configuration property.
batchSize is used while evaluating PythonUDFs.
Evaluating PythonUDFs¶
EvalPythonExec
evaluate(
funcs: Seq[ChainedPythonFunctions],
argOffsets: Array[Array[Int]],
iter: Iterator[InternalRow],
schema: StructType,
context: TaskContext): Iterator[InternalRow]
evaluate is part of the EvalPythonExec abstraction.
evaluate creates an ArrowPythonRunner to compute partitions.
In the end, evaluate converts ColumnarBatches into InternalRows.