AQEShuffleReadExec Physical Operator¶
AQEShuffleReadExec is a unary physical operator in Adaptive Query Execution.
Creating Instance¶
AQEShuffleReadExec takes the following to be created:
- Child physical operator
-
ShufflePartitionSpecs (requires at least one partition)
AQEShuffleReadExec is created when the following adaptive physical optimizations are executed:
- CoalesceShufflePartitions
- OptimizeShuffleWithLocalRead
- OptimizeSkewedJoin
- OptimizeSkewInRebalancePartitions
Performance Metrics¶
metrics is part of the SparkPlan abstraction.
metrics is defined only when the shuffleStage is defined.
Lazy Value
metrics is a Scala lazy value to guarantee that the code to initialize it is executed once only (when accessed for the first time) and the computed value never changes afterwards.
Learn more in the Scala Language Specification.
number of coalesced partitions¶
Only when hasCoalescedPartition
number of partitions¶
number of skewed partition splits¶
Only when hasSkewedPartition
number of skewed partitions¶
Only when hasSkewedPartition
partition data size¶
Only when non-isLocalRead
Child ShuffleQueryStageExec¶
shuffleStage: Option[ShuffleQueryStageExec]
AQEShuffleReadExec is given a child physical operator when created.
When requested for a ShuffleQueryStageExec, AQEShuffleReadExec returns the child physical operator (if that is its type or returns None).
shuffleStage is used when:
AQEShuffleReadExecis requested for the partitionDataSizes, the performance metrics and the shuffleRDD
Shuffle RDD¶
shuffleRDD: RDD[_]
shuffleRDD updates the performance metrics and requests the shuffleStage for the ShuffleExchangeLike that in turn is requested for the shuffle RDD (with the ShufflePartitionSpecs).
Lazy Value
shuffleRDD is a Scala lazy value to guarantee that the code to initialize it is executed once only (when accessed for the first time) and the computed value never changes afterwards.
Learn more in the Scala Language Specification.
shuffleRDD is used when:
AQEShuffleReadExecoperator is requested to doExecute and doExecuteColumnar
Updating Performance Metrics¶
sendDriverMetrics(): Unit
sendDriverMetrics posts a SparkListenerDriverAccumUpdates (with the query execution id and performance metrics).
Partition Data Sizes¶
partitionDataSizes: Option[Seq[Long]]
partitionDataSizes...FIXME
Lazy Value
partitionDataSizes is a Scala lazy value to guarantee that the code to initialize it is executed once only (when accessed for the first time) and the computed value never changes afterwards.
Learn more in the Scala Language Specification.
Executing Physical Operator¶
doExecute(): RDD[InternalRow]
doExecute is part of the SparkPlan abstraction.
doExecute returns the Shuffle RDD.
Columnar Execution¶
doExecuteColumnar(): RDD[ColumnarBatch]
doExecuteColumnar is part of the SparkPlan abstraction.
doExecuteColumnar returns the Shuffle RDD.
Node Arguments¶
stringArgs: Iterator[Any]
stringArgs is part of the TreeNode abstraction.
stringArgs is one of the following:
localwhen isLocalReadcoalesced and skewedwhen hasCoalescedPartition and hasSkewedPartitioncoalescedwhen hasCoalescedPartitionskewedwhen hasSkewedPartition
isLocalRead¶
isLocalRead: Boolean
isLocalRead indicates whether either PartialMapperPartitionSpec or CoalescedMapperPartitionSpec are among the partition specs or not.
isLocalRead is used when:
AQEShuffleReadExecis requested for the node arguments, the partition data sizes and the performance metrics
isCoalescedRead¶
isCoalescedRead: Boolean
isCoalescedRead indicates coalesced shuffle read and is whether the partition specs are all CoalescedPartitionSpecs pair-wise (with the endReducerIndex and startReducerIndex being adjacent) or not.
isCoalescedRead is used when:
AQEShuffleReadExecis requested for the outputPartitioning
hasCoalescedPartition¶
hasCoalescedPartition: Boolean
hasCoalescedPartition is true when there is a CoalescedSpec among the ShufflePartitionSpecs.
hasCoalescedPartition is used when:
AQEShuffleReadExecis requested for the stringArgs, sendDriverMetrics, and metrics
isCoalescedSpec¶
isCoalescedSpec(
spec: ShufflePartitionSpec)
isCoalescedSpec is true when the given ShufflePartitionSpec is one of the following:
CoalescedPartitionSpec(with bothstartReducerIndexandendReducerIndexas0s)CoalescedPartitionSpecwithendReducerIndexlarger thanstartReducerIndex
Otherwise, isCoalescedSpec is false.
isCoalescedSpec is used when:
AQEShuffleReadExecis requested to hasCoalescedPartition and sendDriverMetrics