AQEShuffleReadExec Physical Operator¶
AQEShuffleReadExec
is a unary physical operator in Adaptive Query Execution.
Creating Instance¶
AQEShuffleReadExec
takes the following to be created:
- Child physical operator
-
ShufflePartitionSpec
s (requires at least one partition)
AQEShuffleReadExec
is created when the following adaptive physical optimizations are executed:
- CoalesceShufflePartitions
- OptimizeShuffleWithLocalRead
- OptimizeSkewedJoin
- OptimizeSkewInRebalancePartitions
Performance Metrics¶
metrics
is part of the SparkPlan abstraction.
metrics
is defined only when the shuffleStage is defined.
Lazy Value
metrics
is a Scala lazy value to guarantee that the code to initialize it is executed once only (when accessed for the first time) and the computed value never changes afterwards.
Learn more in the Scala Language Specification.
number of coalesced partitions¶
Only when hasCoalescedPartition
number of partitions¶
number of skewed partition splits¶
Only when hasSkewedPartition
number of skewed partitions¶
Only when hasSkewedPartition
partition data size¶
Only when non-isLocalRead
Child ShuffleQueryStageExec¶
shuffleStage: Option[ShuffleQueryStageExec]
AQEShuffleReadExec
is given a child physical operator when created.
When requested for a ShuffleQueryStageExec, AQEShuffleReadExec
returns the child physical operator (if that is its type or returns None
).
shuffleStage
is used when:
AQEShuffleReadExec
is requested for the partitionDataSizes, the performance metrics and the shuffleRDD
Shuffle RDD¶
shuffleRDD: RDD[_]
shuffleRDD
updates the performance metrics and requests the shuffleStage for the ShuffleExchangeLike that in turn is requested for the shuffle RDD (with the ShufflePartitionSpecs).
Lazy Value
shuffleRDD
is a Scala lazy value to guarantee that the code to initialize it is executed once only (when accessed for the first time) and the computed value never changes afterwards.
Learn more in the Scala Language Specification.
shuffleRDD
is used when:
AQEShuffleReadExec
operator is requested to doExecute and doExecuteColumnar
Updating Performance Metrics¶
sendDriverMetrics(): Unit
sendDriverMetrics
posts a SparkListenerDriverAccumUpdates
(with the query execution id and performance metrics).
Partition Data Sizes¶
partitionDataSizes: Option[Seq[Long]]
partitionDataSizes
...FIXME
Lazy Value
partitionDataSizes
is a Scala lazy value to guarantee that the code to initialize it is executed once only (when accessed for the first time) and the computed value never changes afterwards.
Learn more in the Scala Language Specification.
Executing Physical Operator¶
doExecute(): RDD[InternalRow]
doExecute
is part of the SparkPlan abstraction.
doExecute
returns the Shuffle RDD.
Columnar Execution¶
doExecuteColumnar(): RDD[ColumnarBatch]
doExecuteColumnar
is part of the SparkPlan abstraction.
doExecuteColumnar
returns the Shuffle RDD.
Node Arguments¶
stringArgs: Iterator[Any]
stringArgs
is part of the TreeNode abstraction.
stringArgs
is one of the following:
local
when isLocalReadcoalesced and skewed
when hasCoalescedPartition and hasSkewedPartitioncoalesced
when hasCoalescedPartitionskewed
when hasSkewedPartition
isLocalRead¶
isLocalRead: Boolean
isLocalRead
indicates whether either PartialMapperPartitionSpec
or CoalescedMapperPartitionSpec
are among the partition specs or not.
isLocalRead
is used when:
AQEShuffleReadExec
is requested for the node arguments, the partition data sizes and the performance metrics
isCoalescedRead¶
isCoalescedRead: Boolean
isCoalescedRead
indicates coalesced shuffle read and is whether the partition specs are all CoalescedPartitionSpec
s pair-wise (with the endReducerIndex
and startReducerIndex
being adjacent) or not.
isCoalescedRead
is used when:
AQEShuffleReadExec
is requested for the outputPartitioning
hasCoalescedPartition¶
hasCoalescedPartition: Boolean
hasCoalescedPartition
is true
when there is a CoalescedSpec among the ShufflePartitionSpecs.
hasCoalescedPartition
is used when:
AQEShuffleReadExec
is requested for the stringArgs, sendDriverMetrics, and metrics
isCoalescedSpec¶
isCoalescedSpec(
spec: ShufflePartitionSpec)
isCoalescedSpec
is true
when the given ShufflePartitionSpec
is one of the following:
CoalescedPartitionSpec
(with bothstartReducerIndex
andendReducerIndex
as0
s)CoalescedPartitionSpec
withendReducerIndex
larger thanstartReducerIndex
Otherwise, isCoalescedSpec
is false
.
isCoalescedSpec
is used when:
AQEShuffleReadExec
is requested to hasCoalescedPartition and sendDriverMetrics