SortShuffleWriter — Fallback ShuffleWriter¶
SortShuffleWriter is a "fallback" ShuffleWriter (when
SortShuffleManager is requested for a ShuffleWriter and the more specialized BypassMergeSortShuffleWriter and UnsafeShuffleWriter could not be used).
SortShuffleWriter[K, V, C] is a parameterized type with
V values, and
C combiner values.
SortShuffleWriter takes the following to be created:
SortShuffleWriter is created when:
Writing Records (Into Shuffle Partitioned File In Disk Store)¶
write( records: Iterator[Product2[K, V]]): Unit
write is part of the ShuffleWriter abstraction.
write creates an ExternalSorter based on the ShuffleDependency (of the BaseShuffleHandle), namely the Map-Size Partial Aggregation flag. The
ExternalSorter uses the aggregator and key ordering when the flag is enabled.
write requests the
ExternalSorter to insert all the given records.
Stopping SortShuffleWriter (and Calculating MapStatus)¶
stop( success: Boolean): Option[MapStatus]
stop is part of the ShuffleWriter abstraction.
Otherwise, when stopping flag is already enabled or the input
success is disabled,
stop returns no
In the end,
stop requests the
ExternalSorter to stop and increments the shuffle write time task metrics.
Requirements of BypassMergeSortShuffleHandle (as ShuffleHandle)¶
shouldBypassMergeSort( conf: SparkConf, dep: ShuffleDependency[_, _, _]): Boolean
true when all of the following hold:
shouldBypassMergeSort does not hold (
shouldBypassMergeSort is used when:
SortShuffleManageris requested to register a shuffle (and creates a ShuffleHandle)
stopping internal flag to indicate whether or not this
SortShuffleWriter has been stopped.
ALL logging level for
org.apache.spark.shuffle.sort.SortShuffleWriter logger to see what happens inside.
Add the following line to
Refer to Logging.