PythonEvalType¶
PythonEvalType
are the types of commands that will be sent to the Python worker for execution.
Name | Value | PandasUDFType |
---|---|---|
SQL_GROUPED_AGG_PANDAS_UDF | 202 | GROUPED_AGG |
SQL_GROUPED_MAP_PANDAS_UDF | 201 | GROUPED_MAP |
SQL_SCALAR_PANDAS_UDF | 200 | SCALAR |
SQL_SCALAR_PANDAS_ITER_UDF | 204 | SCALAR_ITER |
PythonEvalType
is defined in org.apache.spark.api.python
Scala package with the same values defined on Python side in the PythonEvalType Python class (in pyspark/rdd.py
package).
SQL_GROUPED_AGG_PANDAS_UDF¶
SQL_GROUPED_AGG_PANDAS_UDF
is a UDF marker of Grouped Aggregate Pandas UDFs (pandas User-Defined Aggregate Functions, pandas UDAFs).
SQL_GROUPED_AGG_PANDAS_UDF
is executed using AggregateInPandasExec physical operator (using ArrowPythonRunner).
Limitations of Pandas UDAFs:
- Return type cannot be
StructType
- Not supported in the
PIVOT
clause - Not supported in streaming aggregation
SQL_GROUPED_AGG_PANDAS_UDF
is used (on Python side) when:
pyspark/worker.py
is requested to read_single_udf and read_udfspyspark/sql/pandas/functions.py
is requested to_create_pandas_udf
andpandas_udf
SQL_GROUPED_AGG_PANDAS_UDF
is used (on Scala side) when:
PythonUDF
is requested for isGroupedAggPandasUDF
SQL_SCALAR_PANDAS_UDF¶
SQL_SCALAR_PANDAS_UDF
is among SCALAR_TYPES of PythonUDF.
SQL_SCALAR_PANDAS_UDF
(with SQL_SCALAR_PANDAS_ITER_UDF) are evaluated using ArrowEvalPython.
SQL_SCALAR_PANDAS_UDF
is used (on Python side) when:
pyspark/worker.py
is requested to read_single_udf and read_udfspyspark/sql/pandas/functions.py
is requested to_create_pandas_udf
andpandas_udf
SQL_SCALAR_PANDAS_ITER_UDF¶
User-Defined Functions¶
UDFRegistration allows user-defined functions to be one of the following PythonEvalType
s: