Skip to content

CoalesceShufflePartitions Adaptive Physical Optimization

CoalesceShufflePartitions is a physical optimization in Adaptive Query Execution.

Creating Instance

CoalesceShufflePartitions takes the following to be created:

CoalesceShufflePartitions is created when:

spark.sql.adaptive.coalescePartitions.enabled

CoalesceShufflePartitions is enabled by default and can be turned off using spark.sql.adaptive.coalescePartitions.enabled configuration property.

Executing Rule

Rule
apply(
  plan: SparkPlan): SparkPlan

apply is part of the Rule abstraction.

apply does nothing (and simply gives the input SparkPlan back unmodified) when one of the following holds:

  1. spark.sql.adaptive.coalescePartitions.enabled configuration property is disabled

  2. The leaf physical operators are not all QueryStageExecs (as it's not safe to reduce the number of shuffle partitions, because it may break the assumption that all children of a spark plan have same number of output partitions).

  3. There is a ShuffleExchangeLike among the ShuffleQueryStageExecs (of the given SparkPlan) that is not supported

apply coalesces the partitions (with the MapOutputStatistics and ShufflePartitionSpec of the ShuffleStageInfos) based on the following settings:

updateShuffleReads

updateShuffleReads(
  plan: SparkPlan,
  specsMap: Map[Int, Seq[ShufflePartitionSpec]]): SparkPlan

updateShuffleReads...FIXME

Supported ShuffleOrigins

AQEShuffleReadRule
supportedShuffleOrigins: Seq[ShuffleOrigin]

supportedShuffleOrigins is part of the AQEShuffleReadRule abstraction.

supportedShuffleOrigins is a collection of the following ShuffleOrigins: