GroupByKey

GroupByKey<K, V> is a PTransform of key-value pairs to produce pairs of the input keys and the associated values.

In other words, GroupByKey<K, V> is the following:

PTransform<PCollection<KV<K, V>>, PCollection<KV<K, Iterable<V>>>>

that could be translated to the following function-oriented notation:

PCollection<KV<K, V>> ~~[GroupByKey<K, V>]~~> PCollection<KV<K, Iterable<V>>>

GroupByKey.create Utility

GroupByKey<K, V> create()

create creates a GroupByKey with the fewKeys flag disabled (false).

GroupByKey.createWithFewKeys Utility

GroupByKey<K, V> createWithFewKeys()

createWithFewKeys creates a GroupByKey with the fewKeys flag enabled (true).

createWithFewKeys is used when Combine.PerKey is requested to expand.

Creating Instance

GroupByKey takes a single fewKeys flag to be created.

GroupByKey is created using create or createWithFewKeys utilities.

Demo

import org.apache.beam.sdk.Pipeline
val p = Pipeline.create()

import org.apache.beam.sdk.transforms.Create
import org.apache.beam.sdk.values.KV
val kvs = Create.of(
  KV.of(0,0),
  KV.of(0,1),
  KV.of(1,1))

val root = p.apply("Input Key-Value Pairs", kvs)

import org.apache.beam.sdk.transforms.GroupByKey
val groupByKey = GroupByKey.create[Int, Int]

scala> :type groupByKey
org.apache.beam.sdk.transforms.GroupByKey[Int,Int]

val groupped = root.apply("Group Numbers", groupByKey)