GroupByKey
GroupByKey<K, V>
is a PTransform of key-value pairs to produce pairs of the input keys and the associated values.
In other words, GroupByKey<K, V> is the following:
PTransform<PCollection<KV<K, V>>, PCollection<KV<K, Iterable<V>>>>
that could be translated to the following function-oriented notation:
PCollection<KV<K, V>> ~~[GroupByKey<K, V>]~~> PCollection<KV<K, Iterable<V>>>
GroupByKey.create Utility
GroupByKey<K, V> create()
create creates a GroupByKey with the fewKeys flag disabled (false
).
GroupByKey.createWithFewKeys Utility
GroupByKey<K, V> createWithFewKeys()
createWithFewKeys creates a GroupByKey with the fewKeys flag enabled (true
).
createWithFewKeys is used when Combine.PerKey is requested to expand.
Creating Instance
GroupByKey takes a single fewKeys
flag to be created.
GroupByKey is created using create or createWithFewKeys utilities.
Demo
import org.apache.beam.sdk.Pipeline
val p = Pipeline.create()
import org.apache.beam.sdk.transforms.Create
import org.apache.beam.sdk.values.KV
val kvs = Create.of(
KV.of(0,0),
KV.of(0,1),
KV.of(1,1))
val root = p.apply("Input Key-Value Pairs", kvs)
import org.apache.beam.sdk.transforms.GroupByKey
val groupByKey = GroupByKey.create[Int, Int]
scala> :type groupByKey
org.apache.beam.sdk.transforms.GroupByKey[Int,Int]
val groupped = root.apply("Group Numbers", groupByKey)