ArrowPythonRunner¶
ArrowPythonRunner is a BasePythonRunner with Iterator[InternalRow] input and ColumnarBatch (vectorized) output.
ArrowPythonRunner supports BasicPythonArrowInput and BasicPythonArrowOutput.
Creating Instance¶
ArrowPythonRunner takes the following to be created:
-
ChainedPythonFunctionses - Eval Type
- Argument Offsets
-
Schema(Spark SQL) - TimeZone ID
- Worker Configuration
- Performance Metrics
ArrowPythonRunner is created when the following physical operators (Spark SQL) are executed:
- AggregateInPandasExec
- ArrowEvalPythonExec
FlatMapGroupsInPandasExecMapInPandasExecWindowInPandasExec
bufferSize¶
bufferSize is the value of spark.sql.execution.pandas.udf.buffer.size configuration property.
simplifiedTraceback¶
BasePythonRunner
simplifiedTraceback: Boolean
simplifiedTraceback is part of the BasePythonRunner abstraction.
simplifiedTraceback is the value of spark.sql.execution.pyspark.udf.simplifiedTraceback.enabled configuration property.