AggregateInPandasExec Physical Operator¶
AggregateInPandasExec
is a unary physical operator (Spark SQL) that executes pandas UDAFs using ArrowPythonRunner (one per partition).
Creating Instance¶
AggregateInPandasExec
takes the following to be created:
- Grouping Expressions (Spark SQL) (
Seq[NamedExpression]
) - pandas UDAFs (PythonUDFs with SQL_GROUPED_AGG_PANDAS_UDF)
- Result Named Expressions (Spark SQL) (
Seq[NamedExpression]
) - Child Physical Operator (Spark SQL)
AggregateInPandasExec
is created when Aggregation
execution planning strategy (Spark SQL) is executed for Aggregate
logical operators (Spark SQL) with PythonUDF aggregate expressions only.
Executing Operator¶
doExecute
uses ArrowPythonRunner (one per partition) to execute PythonUDFs.