Skip to content

PandasCogroupedOps

PandasCogroupedOps is a logical grouping created by GroupedData.cogroup.

from pyspark.sql.pandas.group_ops import PandasCogroupedOps

PandasCogroupedOps is included in __all__ of pyspark.sql module (via __init__.py).

Creating Instance

PandasCogroupedOps takes the following to be created:

PandasCogroupedOps is created when:

  • PandasGroupedOpsMixin is requested to cogroup

applyInPandas

applyInPandas(self, func, schema)

applyInPandas creates a DataFrame with the result of flatMapCoGroupsInPandas with a pandas user defined function of SQL_COGROUPED_MAP_PANDAS_UDF type.


applyInPandas creates a pandas user defined function for the given func and the return type by the given schema. The pandas UDF is of SQL_COGROUPED_MAP_PANDAS_UDF type.

applyInPandas applies the pandas UDF on all the columns of the two GroupedDatas (that creates a Column expression).

applyInPandas requests the GroupedData for the associated RelationalGroupedDataset that is in turn requested to flatMapCoGroupsInPandas.


Last update: 2021-03-03