FlatMapGroupsInPandasExec Physical Operator¶
FlatMapGroupsInPandasExec
is a unary physical operator (Spark SQL) to execute a PythonUDF using ArrowPythonRunner (in SQL_GROUPED_MAP_PANDAS_UDF eval mode).
FlatMapGroupsInPandasExec
represents a FlatMapGroupsInPandas logical operator at execution time.
Creating Instance¶
FlatMapGroupsInPandasExec
takes the following to be created:
- Grouping Attributes (Spark SQL)
- Function Expression (Spark SQL)
- Output Attributes (Spark SQL)
- Child Physical Operator (Spark SQL)
FlatMapGroupsInPandasExec
is created when:
BasicOperators
(Spark SQL) execution planning strategy is executed (on a logical query plan with FlatMapGroupsInPandas logical operators)
Performance Metrics¶
ArrowEvalPythonExec
is a PythonSQLMetrics.
Executing Operator¶
doExecute
requests the child physical operator to execute
(and produce a RDD[InternalRow]
).
For every non-empty partition (using RDD.mapPartitionsInternal
), doExecute
creates an ArrowPythonRunner (with SQL_GROUPED_MAP_PANDAS_UDF eval type) and executePython.