ShuffleDependency takes the following to be created:
When created, ShuffleDependency gets shuffle id (as
ShuffleDependency uses the input RDD to access
In the end, ShuffleDependency registers itself for cleanup with
ShuffleDependency accesses the optional
shuffleHandle is the
ShuffleHandle of a ShuffleDependency as assigned eagerly when ShuffleDependency was created.
ShuffleDependency uses a mapSideCombine flag that controls whether to perform map-side partial aggregation (map-side combine) using an Aggregator.
mapSideCombine is disabled (
false) by default and can be enabled (
true) for some use cases of ShuffledRDD.
ShuffleDependency requires that the optional Aggregator is defined when the flag is enabled.
mapSideCombine is used when:
aggregator: Option[Aggregator[K, V, C]] = None
aggregator is a map/reduce-side Aggregator (for a RDD’s shuffle).
aggregator is by default undefined (i.e.
None) when ShuffleDependency is created.