Skip to content

PythonEvalType

PythonEvalType are the types of commands that will be sent to the Python worker for execution.

Name Value PandasUDFType
SQL_GROUPED_AGG_PANDAS_UDF 202 GROUPED_AGG
SQL_GROUPED_MAP_PANDAS_UDF 201 GROUPED_MAP
SQL_SCALAR_PANDAS_UDF 200 SCALAR
SQL_SCALAR_PANDAS_ITER_UDF 204 SCALAR_ITER

PythonEvalType is defined in org.apache.spark.api.python Scala package with the same values defined on Python side in the PythonEvalType Python class (in pyspark/rdd.py package).

SQL_GROUPED_AGG_PANDAS_UDF

SQL_GROUPED_AGG_PANDAS_UDF is a UDF marker of Grouped Aggregate Pandas UDFs (pandas User-Defined Aggregate Functions, pandas UDAFs).

SQL_GROUPED_AGG_PANDAS_UDF is executed using AggregateInPandasExec physical operator (using ArrowPythonRunner).

Limitations of Pandas UDAFs:

  • Return type cannot be StructType
  • Not supported in the PIVOT clause
  • Not supported in streaming aggregation

SQL_GROUPED_AGG_PANDAS_UDF is used (on Python side) when:

  • pyspark/worker.py is requested to read_single_udf and read_udfs
  • pyspark/sql/pandas/functions.py is requested to _create_pandas_udf and pandas_udf

SQL_GROUPED_AGG_PANDAS_UDF is used (on Scala side) when:

SQL_SCALAR_PANDAS_UDF

SQL_SCALAR_PANDAS_UDF is among SCALAR_TYPES of PythonUDF.

SQL_SCALAR_PANDAS_UDF (with SQL_SCALAR_PANDAS_ITER_UDF) are evaluated using ArrowEvalPython.

SQL_SCALAR_PANDAS_UDF is used (on Python side) when:

  • pyspark/worker.py is requested to read_single_udf and read_udfs
  • pyspark/sql/pandas/functions.py is requested to _create_pandas_udf and pandas_udf

SQL_SCALAR_PANDAS_ITER_UDF

User-Defined Functions

UDFRegistration allows user-defined functions to be one of the following PythonEvalTypes: