Skip to content

PandasGroupedOpsMixin

PandasGroupedOpsMixin is a Python mixin for GroupedData class.

applyInPandas

applyInPandas(
  self,
  func: "PandasGroupedMapFunction", # (1)!
  schema: Union[StructType, str]
) -> DataFrame
  1. from pandas.core.frame import DataFrame as PandasDataFrame
    DataFrameLike = PandasDataFrame
    PandasGroupedMapFunction = Union[
      # func: pandas.DataFrame -> pandas.DataFrame
      Callable[[DataFrameLike], DataFrameLike],
      # func: (groupKey(s), pandas.DataFrame) -> pandas.DataFrame
      Callable[[Any, DataFrameLike], DataFrameLike],
    ]
    

applyInPandas creates a pandas_udf with the following:

pandas_udf Value
f The given func
returnType The given schema
functionType PandasUDFType.GROUPED_MAP

applyInPandas creates a Column wtih the pandas_udf applied to all the columns of the DataFrame of this GroupedData.

applyInPandas requests the RelationalGroupedDataset to flatMapGroupsInPandas with the underlying Catalyst expression of the Column with the pandas_udf.

In the end, applyInPandas creates a DataFrame with the result.

cogroup

cogroup(
  self,
  other: "GroupedData") -> "PandasCogroupedOps"

cogroup creates a PandasCogroupedOps for this and the other GroupedDatas.