CoalesceShufflePartitions Adaptive Physical Optimization¶
CoalesceShufflePartitions
is a physical optimization in Adaptive Query Execution.
Creating Instance¶
CoalesceShufflePartitions
takes the following to be created:
CoalesceShufflePartitions
is created when:
AdaptiveSparkPlanExec
physical operator is requested for the QueryStage Optimizer Rules
spark.sql.adaptive.coalescePartitions.enabled¶
CoalesceShufflePartitions
is enabled by default and can be turned off using spark.sql.adaptive.coalescePartitions.enabled configuration property.
Executing Rule¶
apply(
plan: SparkPlan): SparkPlan
apply
is part of the Rule abstraction.
apply
does nothing (and simply gives the input SparkPlan back unmodified) when one of the following holds:
-
spark.sql.adaptive.coalescePartitions.enabled configuration property is disabled
-
The leaf physical operators are not all QueryStageExecs (as it's not safe to reduce the number of shuffle partitions, because it may break the assumption that all children of a spark plan have same number of output partitions).
-
There is a ShuffleExchangeLike among the ShuffleQueryStageExecs (of the given SparkPlan) that is not supported
apply
coalesces the partitions (with the MapOutputStatistics
and ShufflePartitionSpec
of the ShuffleStageInfo
s) based on the following settings:
-
The minimum number of partitions being the default Spark parallelism with spark.sql.adaptive.coalescePartitions.parallelismFirst enabled or
1
-
spark.sql.adaptive.advisoryPartitionSizeInBytes as the advisory target size of partitions
-
spark.sql.adaptive.coalescePartitions.minPartitionSize as the minimum size of partitions
updateShuffleReads¶
updateShuffleReads(
plan: SparkPlan,
specsMap: Map[Int, Seq[ShufflePartitionSpec]]): SparkPlan
updateShuffleReads
...FIXME
Supported ShuffleOrigins¶
supportedShuffleOrigins: Seq[ShuffleOrigin]
supportedShuffleOrigins
is the following ShuffleOrigin
s:
ENSURE_REQUIREMENTS
REPARTITION_BY_COL
REBALANCE_PARTITIONS_BY_NONE
REBALANCE_PARTITIONS_BY_COL
supportedShuffleOrigins
is part of the AQEShuffleReadRule abstraction.