Skip to content

AggregateInPandasExec Physical Operator

AggregateInPandasExec is a unary physical operator (Spark SQL) that executes pandas UDAFs using ArrowPythonRunner (one per partition).

Creating Instance

AggregateInPandasExec takes the following to be created:

AggregateInPandasExec is created when Aggregation execution planning strategy (Spark SQL) is executed for Aggregate logical operators (Spark SQL) with PythonUDF aggregate expressions only.

Executing Operator

SparkPlan
doExecute(): RDD[InternalRow]

doExecute is part of the SparkPlan (Spark SQL) abstraction.

doExecute uses ArrowPythonRunner (one per partition) to execute PythonUDFs.