Skip to content

GroupedData

GroupedData is created for the following high-level operators:

GroupedData is then used to execute aggregate functions (over groups of rows) using agg operator:

GroupedData is a Python class with PandasGroupedOpsMixin mixin.

GroupedData is defined in pyspark.sql.group module.

from pyspark.sql.group import GroupedData

Creating Instance

GroupedData takes the following to be created:

agg

agg(
  self,
  *exprs: Union[Column, Dict[str, str]]) -> DataFrame

Note

Built-in aggregation functions and pandas UDAFs cannot be used together in a single agg.

agg accepts a collection of Column expressions or a single Dict[str, str] object.

agg requests the RelationalGroupedDataset to agg (Spark SQL).

In the end, agg creates a DataFrame with the agg result.