FlatMapGroupsInPandasExec Physical Operator¶
FlatMapGroupsInPandasExec is a unary physical operator (Spark SQL) to execute a PythonUDF using ArrowPythonRunner (in SQL_GROUPED_MAP_PANDAS_UDF eval mode).
FlatMapGroupsInPandasExec represents a FlatMapGroupsInPandas logical operator at execution time.
Creating Instance¶
FlatMapGroupsInPandasExec takes the following to be created:
- Grouping Attributes (Spark SQL)
- Function Expression (Spark SQL)
- Output Attributes (Spark SQL)
- Child Physical Operator (Spark SQL)
FlatMapGroupsInPandasExec is created when:
BasicOperators(Spark SQL) execution planning strategy is executed (on a logical query plan with FlatMapGroupsInPandas logical operators)
Performance Metrics¶
ArrowEvalPythonExec is a PythonSQLMetrics.
Executing Operator¶
doExecute requests the child physical operator to execute (and produce a RDD[InternalRow]).
For every non-empty partition (using RDD.mapPartitionsInternal), doExecute creates an ArrowPythonRunner (with SQL_GROUPED_MAP_PANDAS_UDF eval type) and executePython.