CoalesceShufflePartitions Adaptive Physical Optimization¶
CoalesceShufflePartitions is a physical optimization in Adaptive Query Execution.
CoalesceShufflePartitions takes the following to be created:
CoalesceShufflePartitions is created when:
AdaptiveSparkPlanExecphysical operator is requested for the QueryStage Optimizer Rules
CoalesceShufflePartitions is enabled by default and can be turned off using spark.sql.adaptive.coalescePartitions.enabled configuration property.
apply( plan: SparkPlan): SparkPlan
apply is part of the Rule abstraction.
apply does nothing (and simply gives the input SparkPlan back unmodified) when one of the following holds:
spark.sql.adaptive.coalescePartitions.enabled configuration property is disabled
The leaf physical operators are not all QueryStageExecs (as it's not safe to reduce the number of shuffle partitions, because it may break the assumption that all children of a spark plan have same number of output partitions).
There is a ShuffleExchangeLike among the ShuffleQueryStageExecs (of the given SparkPlan) that is not supported
apply coalesces the partitions (with the
ShufflePartitionSpec of the
ShuffleStageInfos) based on the following settings:
The minimum number of partitions being the default Spark parallelism with spark.sql.adaptive.coalescePartitions.parallelismFirst enabled or
spark.sql.adaptive.advisoryPartitionSizeInBytes as the advisory target size of partitions
spark.sql.adaptive.coalescePartitions.minPartitionSize as the minimum size of partitions
updateShuffleReads( plan: SparkPlan, specsMap: Map[Int, Seq[ShufflePartitionSpec]]): SparkPlan
supportedShuffleOrigins is the following
supportedShuffleOrigins is part of the AQEShuffleReadRule abstraction.