Skip to content

FlatMapGroupsInPandasExec Physical Operator

FlatMapGroupsInPandasExec is a unary physical operator (Spark SQL) to execute a PythonUDF using ArrowPythonRunner (in SQL_GROUPED_MAP_PANDAS_UDF eval mode).

FlatMapGroupsInPandasExec represents a FlatMapGroupsInPandas logical operator at execution time.

Creating Instance

FlatMapGroupsInPandasExec takes the following to be created:

FlatMapGroupsInPandasExec is created when:

Performance Metrics

ArrowEvalPythonExec is a PythonSQLMetrics.

Executing Operator

SparkPlan
doExecute(): RDD[InternalRow]

doExecute is part of the SparkPlan (Spark SQL) abstraction.

doExecute requests the child physical operator to execute (and produce a RDD[InternalRow]).

For every non-empty partition (using RDD.mapPartitionsInternal), doExecute creates an ArrowPythonRunner (with SQL_GROUPED_MAP_PANDAS_UDF eval type) and executePython.