CoalesceShufflePartitions Adaptive Physical Optimization¶
CoalesceShufflePartitions is a physical optimization in Adaptive Query Execution.
Creating Instance¶
CoalesceShufflePartitions takes the following to be created:
CoalesceShufflePartitions is created when:
AdaptiveSparkPlanExecphysical operator is requested for the QueryStage Optimizer Rules
spark.sql.adaptive.coalescePartitions.enabled¶
CoalesceShufflePartitions is enabled by default and can be turned off using spark.sql.adaptive.coalescePartitions.enabled configuration property.
Executing Rule¶
apply does nothing (and simply gives the input SparkPlan back unmodified) when one of the following holds:
-
spark.sql.adaptive.coalescePartitions.enabled configuration property is disabled
-
The leaf physical operators are not all QueryStageExecs (as it's not safe to reduce the number of shuffle partitions, because it may break the assumption that all children of a spark plan have same number of output partitions).
-
There is a ShuffleExchangeLike among the ShuffleQueryStageExecs (of the given SparkPlan) that is not supported
apply coalesces the partitions (with the MapOutputStatistics and ShufflePartitionSpec of the ShuffleStageInfos) based on the following settings:
-
The minimum number of partitions being the default Spark parallelism with spark.sql.adaptive.coalescePartitions.parallelismFirst enabled or
1 -
spark.sql.adaptive.advisoryPartitionSizeInBytes as the advisory target size of partitions
-
spark.sql.adaptive.coalescePartitions.minPartitionSize as the minimum size of partitions
updateShuffleReads¶
updateShuffleReads(
plan: SparkPlan,
specsMap: Map[Int, Seq[ShufflePartitionSpec]]): SparkPlan
updateShuffleReads...FIXME
Supported ShuffleOrigins¶
AQEShuffleReadRule
supportedShuffleOrigins: Seq[ShuffleOrigin]
supportedShuffleOrigins is part of the AQEShuffleReadRule abstraction.
supportedShuffleOrigins is a collection of the following ShuffleOrigins: