ArrowPythonRunner¶
ArrowPythonRunner
is a BasePythonRunner with Iterator[InternalRow]
input and ColumnarBatch
(vectorized) output.
ArrowPythonRunner
supports BasicPythonArrowInput
and BasicPythonArrowOutput.
Creating Instance¶
ArrowPythonRunner
takes the following to be created:
-
ChainedPythonFunctions
es - Eval Type
- Argument Offsets
-
Schema
(Spark SQL) - TimeZone ID
- Worker Configuration
- Performance Metrics
ArrowPythonRunner
is created when the following physical operators (Spark SQL) are executed:
- AggregateInPandasExec
- ArrowEvalPythonExec
FlatMapGroupsInPandasExec
MapInPandasExec
WindowInPandasExec
bufferSize¶
bufferSize
is the value of spark.sql.execution.pandas.udf.buffer.size configuration property.
simplifiedTraceback¶
BasePythonRunner
simplifiedTraceback: Boolean
simplifiedTraceback
is part of the BasePythonRunner abstraction.
simplifiedTraceback
is the value of spark.sql.execution.pyspark.udf.simplifiedTraceback.enabled configuration property.