CoalesceShufflePartitions Adaptive Physical Optimization¶
CoalesceShufflePartitions
is a physical optimization in Adaptive Query Execution.
Creating Instance¶
CoalesceShufflePartitions
takes the following to be created:
CoalesceShufflePartitions
is created when:
AdaptiveSparkPlanExec
physical operator is requested for the QueryStage Optimizer Rules
spark.sql.adaptive.coalescePartitions.enabled¶
CoalesceShufflePartitions
is enabled by default and can be turned off using spark.sql.adaptive.coalescePartitions.enabled configuration property.
Executing Rule¶
apply
does nothing (and simply gives the input SparkPlan back unmodified) when one of the following holds:
-
spark.sql.adaptive.coalescePartitions.enabled configuration property is disabled
-
The leaf physical operators are not all QueryStageExecs (as it's not safe to reduce the number of shuffle partitions, because it may break the assumption that all children of a spark plan have same number of output partitions).
-
There is a ShuffleExchangeLike among the ShuffleQueryStageExecs (of the given SparkPlan) that is not supported
apply
coalesces the partitions (with the MapOutputStatistics
and ShufflePartitionSpec
of the ShuffleStageInfo
s) based on the following settings:
-
The minimum number of partitions being the default Spark parallelism with spark.sql.adaptive.coalescePartitions.parallelismFirst enabled or
1
-
spark.sql.adaptive.advisoryPartitionSizeInBytes as the advisory target size of partitions
-
spark.sql.adaptive.coalescePartitions.minPartitionSize as the minimum size of partitions
updateShuffleReads¶
updateShuffleReads(
plan: SparkPlan,
specsMap: Map[Int, Seq[ShufflePartitionSpec]]): SparkPlan
updateShuffleReads
...FIXME
Supported ShuffleOrigins¶
AQEShuffleReadRule
supportedShuffleOrigins: Seq[ShuffleOrigin]
supportedShuffleOrigins
is part of the AQEShuffleReadRule abstraction.
supportedShuffleOrigins
is a collection of the following ShuffleOrigins: