AggregateInPandasExec Physical Operator¶
AggregateInPandasExec
is a unary physical operator (Spark SQL).
Creating Instance¶
AggregateInPandasExec
takes the following to be created:
- Grouping Expressions (Spark SQL) (
Seq[NamedExpression]
) - PythonUDFs
- Result Named Expressions (Spark SQL) (
Seq[NamedExpression]
) - Child Physical Operator (Spark SQL)
AggregateInPandasExec
is created when Aggregation
execution planning strategy (Spark SQL) is executed for Aggregate
logical operators (Spark SQL) with PythonUDF aggregate expressions only.
Executing Operator¶
doExecute(): RDD[InternalRow]
doExecute
uses ArrowPythonRunner (one per partition) to execute PythonUDFs.
doExecute
is part of the SparkPlan
(Spark SQL) abstraction.
Last update: 2021-03-03