PythonEvalType¶
PythonEvalType are the types of commands that will be sent to the Python worker for execution.
| Name | Value | PandasUDFType |
|---|---|---|
| SQL_GROUPED_AGG_PANDAS_UDF | 202 | GROUPED_AGG |
| SQL_GROUPED_MAP_PANDAS_UDF | 201 | GROUPED_MAP |
| SQL_SCALAR_PANDAS_UDF | 200 | SCALAR |
| SQL_SCALAR_PANDAS_ITER_UDF | 204 | SCALAR_ITER |
PythonEvalType is defined in org.apache.spark.api.python Scala package with the same values defined on Python side in the PythonEvalType Python class (in pyspark/rdd.py package).
SQL_GROUPED_AGG_PANDAS_UDF¶
SQL_GROUPED_AGG_PANDAS_UDF is a UDF marker of Grouped Aggregate Pandas UDFs (pandas User-Defined Aggregate Functions, pandas UDAFs).
SQL_GROUPED_AGG_PANDAS_UDF is executed using AggregateInPandasExec physical operator (using ArrowPythonRunner).
Limitations of Pandas UDAFs:
- Return type cannot be
StructType - Not supported in the
PIVOTclause - Not supported in streaming aggregation
SQL_GROUPED_AGG_PANDAS_UDF is used (on Python side) when:
pyspark/worker.pyis requested to read_single_udf and read_udfspyspark/sql/pandas/functions.pyis requested to_create_pandas_udfandpandas_udf
SQL_GROUPED_AGG_PANDAS_UDF is used (on Scala side) when:
PythonUDFis requested for isGroupedAggPandasUDF
SQL_SCALAR_PANDAS_UDF¶
SQL_SCALAR_PANDAS_UDF is among SCALAR_TYPES of PythonUDF.
SQL_SCALAR_PANDAS_UDF (with SQL_SCALAR_PANDAS_ITER_UDF) are evaluated using ArrowEvalPython.
SQL_SCALAR_PANDAS_UDF is used (on Python side) when:
pyspark/worker.pyis requested to read_single_udf and read_udfspyspark/sql/pandas/functions.pyis requested to_create_pandas_udfandpandas_udf
SQL_SCALAR_PANDAS_ITER_UDF¶
User-Defined Functions¶
UDFRegistration allows user-defined functions to be one of the following PythonEvalTypes: