Skip to content

ArrowPythonRunner

ArrowPythonRunner is a BasePythonRunner with Iterator[InternalRow] input and ColumnarBatch (vectorized) output.

ArrowPythonRunner supports BasicPythonArrowInput and BasicPythonArrowOutput.

Creating Instance

ArrowPythonRunner takes the following to be created:

  • ChainedPythonFunctionses
  • Eval Type
  • Argument Offsets
  • Schema (Spark SQL)
  • TimeZone ID
  • Worker Configuration
  • Performance Metrics

ArrowPythonRunner is created when the following physical operators (Spark SQL) are executed:

bufferSize

BasePythonRunner
bufferSize: Int

bufferSize is part of the BasePythonRunner abstraction.

bufferSize is the value of spark.sql.execution.pandas.udf.buffer.size configuration property.

simplifiedTraceback

BasePythonRunner
simplifiedTraceback: Boolean

simplifiedTraceback is part of the BasePythonRunner abstraction.

simplifiedTraceback is the value of spark.sql.execution.pyspark.udf.simplifiedTraceback.enabled configuration property.