ShuffleQueryStageExec Adaptive Leaf Physical Operator¶
ShuffleQueryStageExec takes the following to be created:
ShuffleQueryStageExec is created when:
- AdaptiveSparkPlanExec physical operator is requested to newQueryStage (for a ShuffleExchangeExec)
ShuffleQueryStageExecphysical operator is requested to newReuseInstance
ShuffleQueryStageExec initializes the
shuffle internal registry when created.
ShuffleQueryStageExec throws an
wrong plan for shuffle stage: [tree]
shuffle is used when:
AQEShuffleReadExecunary physical operator is requested for the shuffleRDD
- CoalesceShufflePartitions physical optimization is executed
- OptimizeShuffleWithLocalRead physical optimization is executed
- OptimizeSkewedJoin physical optimization is executed
- OptimizeSkewInRebalancePartitions physical optimization is executed
ShuffleQueryStageExecleaf physical operator is requested for the shuffle MapOutputStatistics, newReuseInstance and getRuntimeStatistics
Shuffle MapOutputStatistics Future¶
shuffleFuture is a Scala lazy value to guarantee that the code to initialize it is executed once only (when accessed for the first time) and the computed value never changes afterwards.
Learn more in the Scala Language Specification.
shuffleFuture is used when:
doMaterialize is part of the QueryStageExec abstraction.
doMaterialize returns the Shuffle MapOutputStatistics Future.
cancel is part of the QueryStageExec abstraction.
cancel cancels the Shuffle MapOutputStatistics Future (unless already completed).
New ShuffleQueryStageExec Instance for Reuse¶
newReuseInstance( newStageId: Int, newOutput: Seq[Attribute]): QueryStageExec
newReuseInstance is part of the QueryStageExec abstraction.
newReuseInstance creates a new
ShuffleQueryStageExec with the following:
|Query Stage ID||The given |
|SparkPlan||A new ReusedExchangeExec with the given |
newReuseInstance requests the new
ShuffleQueryStageExec to use the _resultOption.
mapStats assumes that the MapOutputStatistics is already available or throws an
assertion failed: ShuffleQueryStageExec should already be ready
mapStats returns the MapOutputStatistics.
mapStats is used when:
AQEShuffleReadExecunary physical operator is requested for the partitionDataSizes
- DynamicJoinSelection adaptive optimization is executed (and selectJoinStrategy)
- OptimizeShuffleWithLocalRead adaptive physical optimization is executed (and canUseLocalShuffleRead)
- CoalesceShufflePartitions, OptimizeSkewedJoin and OptimizeSkewInRebalancePartitions adaptive physical optimizations are executed
getRuntimeStatistics is part of the QueryStageExec abstraction.
ALL logging level for
org.apache.spark.sql.execution.adaptive.ShuffleQueryStageExec logger to see what happens inside.
Add the following line to
logger.ShuffleQueryStageExec.name = org.apache.spark.sql.execution.adaptive.ShuffleQueryStageExec logger.ShuffleQueryStageExec.level = all
Refer to Logging.