Skip to content

AQEShuffleReadExec Physical Operator

AQEShuffleReadExec is a unary physical operator in Adaptive Query Execution.

Creating Instance

AQEShuffleReadExec takes the following to be created:

AQEShuffleReadExec is created when the following adaptive physical optimizations are executed:

Performance Metrics

metrics is part of the SparkPlan abstraction.


metrics is defined only when the shuffleStage is defined.

Lazy Value

metrics is a Scala lazy value to guarantee that the code to initialize it is executed once only (when accessed for the first time) and the computed value never changes afterwards.

Learn more in the Scala Language Specification.

number of coalesced partitions

Only when hasCoalescedPartition

number of partitions

number of skewed partition splits

Only when hasSkewedPartition

number of skewed partitions

Only when hasSkewedPartition

partition data size

Only when non-isLocalRead

Child ShuffleQueryStageExec

shuffleStage: Option[ShuffleQueryStageExec]

AQEShuffleReadExec is given a child physical operator when created.

When requested for a ShuffleQueryStageExec, AQEShuffleReadExec returns the child physical operator (if that is its type or returns None).

shuffleStage is used when:

Shuffle RDD

shuffleRDD: RDD[_]

shuffleRDD updates the performance metrics and requests the shuffleStage for the ShuffleExchangeLike that in turn is requested for the shuffle RDD (with the ShufflePartitionSpecs).

Lazy Value

shuffleRDD is a Scala lazy value to guarantee that the code to initialize it is executed once only (when accessed for the first time) and the computed value never changes afterwards.

Learn more in the Scala Language Specification.

shuffleRDD is used when:

Updating Performance Metrics

sendDriverMetrics(): Unit

sendDriverMetrics posts a SparkListenerDriverAccumUpdates (with the query execution id and performance metrics).

Partition Data Sizes

partitionDataSizes: Option[Seq[Long]]

partitionDataSizes...FIXME

Lazy Value

partitionDataSizes is a Scala lazy value to guarantee that the code to initialize it is executed once only (when accessed for the first time) and the computed value never changes afterwards.

Learn more in the Scala Language Specification.

Executing Physical Operator

doExecute(): RDD[InternalRow]

doExecute is part of the SparkPlan abstraction.


doExecute returns the Shuffle RDD.

Columnar Execution

doExecuteColumnar(): RDD[ColumnarBatch]

doExecuteColumnar is part of the SparkPlan abstraction.


doExecuteColumnar returns the Shuffle RDD.

Node Arguments

stringArgs: Iterator[Any]

stringArgs is part of the TreeNode abstraction.


stringArgs is one of the following:

isLocalRead

isLocalRead: Boolean

isLocalRead indicates whether either PartialMapperPartitionSpec or CoalescedMapperPartitionSpec are among the partition specs or not.


isLocalRead is used when:

isCoalescedRead

isCoalescedRead: Boolean

isCoalescedRead indicates coalesced shuffle read and is whether the partition specs are all CoalescedPartitionSpecs pair-wise (with the endReducerIndex and startReducerIndex being adjacent) or not.


isCoalescedRead is used when:

hasCoalescedPartition

hasCoalescedPartition: Boolean

hasCoalescedPartition is true when there is a CoalescedSpec among the ShufflePartitionSpecs.


hasCoalescedPartition is used when:

isCoalescedSpec

isCoalescedSpec(
  spec: ShufflePartitionSpec)

isCoalescedSpec is true when the given ShufflePartitionSpec is one of the following:

  • CoalescedPartitionSpec (with both startReducerIndex and endReducerIndex as 0s)
  • CoalescedPartitionSpec with endReducerIndex larger than startReducerIndex

Otherwise, isCoalescedSpec is false.


isCoalescedSpec is used when: