ArrowPythonRunner¶

ArrowPythonRunner is a BasePythonRunner with Iterator[InternalRow] input and ColumnarBatch (vectorized) output.

ArrowPythonRunner supports BasicPythonArrowInput and BasicPythonArrowOutput.

Creating Instance¶

ArrowPythonRunner takes the following to be created:

ArrowPythonRunner is created when the following physical operators (Spark SQL) are executed:

BasePythonRunner

bufferSize: Int

bufferSize is part of the BasePythonRunner abstraction.

bufferSize is the value of spark.sql.execution.pandas.udf.buffer.size configuration property.

BasePythonRunner

simplifiedTraceback: Boolean

simplifiedTraceback is part of the BasePythonRunner abstraction.

simplifiedTraceback is the value of spark.sql.execution.pyspark.udf.simplifiedTraceback.enabled configuration property.