PythonEvalType¶

PythonEvalType are the types of commands that will be sent to the Python worker for execution.

Name	Value	PandasUDFType
SQL_GROUPED_AGG_PANDAS_UDF	202	GROUPED_AGG
SQL_GROUPED_MAP_PANDAS_UDF	201	GROUPED_MAP
SQL_SCALAR_PANDAS_UDF	200	SCALAR
SQL_SCALAR_PANDAS_ITER_UDF	204	SCALAR_ITER

PythonEvalType is defined in org.apache.spark.api.python Scala package with the same values defined on Python side in the PythonEvalType Python class (in pyspark/rdd.py package).

SQL_GROUPED_AGG_PANDAS_UDF¶

SQL_GROUPED_AGG_PANDAS_UDF is a UDF marker of Grouped Aggregate Pandas UDFs (pandas User-Defined Aggregate Functions, pandas UDAFs).

SQL_GROUPED_AGG_PANDAS_UDF is executed using AggregateInPandasExec physical operator (using ArrowPythonRunner).

Limitations of Pandas UDAFs:

Return type cannot be StructType
Not supported in the PIVOT clause
Not supported in streaming aggregation

SQL_GROUPED_AGG_PANDAS_UDF is used (on Python side) when:

pyspark/worker.py is requested to read_single_udf and read_udfs
pyspark/sql/pandas/functions.py is requested to _create_pandas_udf and pandas_udf

SQL_GROUPED_AGG_PANDAS_UDF is used (on Scala side) when:

PythonUDF is requested for isGroupedAggPandasUDF

SQL_SCALAR_PANDAS_UDF¶

SQL_SCALAR_PANDAS_UDF is among SCALAR_TYPES of PythonUDF.

SQL_SCALAR_PANDAS_UDF (with SQL_SCALAR_PANDAS_ITER_UDF) are evaluated using ArrowEvalPython.

SQL_SCALAR_PANDAS_UDF is used (on Python side) when:

pyspark/worker.py is requested to read_single_udf and read_udfs
pyspark/sql/pandas/functions.py is requested to _create_pandas_udf and pandas_udf

SQL_SCALAR_PANDAS_ITER_UDF¶

User-Defined Functions¶

UDFRegistration allows user-defined functions to be one of the following PythonEvalTypes:

SQL_BATCHED_UDF
SQL_SCALAR_PANDAS_UDF
SQL_SCALAR_PANDAS_ITER_UDF
SQL_GROUPED_AGG_PANDAS_UDF