QueryStageExec Leaf Physical Operators¶
QueryStageExec is an extension of the LeafExecNode abstraction for leaf physical operators for Adaptive Query Execution.
Contract¶
Cancelling¶
cancel(): Unit
Cancels the stage materialization if in progress; otherwise does nothing.
Used when:
AdaptiveSparkPlanExecphysical operator is requested to cleanUpAndThrowException
Materializing¶
doMaterialize(): Future[Any]
Used when:
QueryStageExecis requested to materialize
Runtime Statistics¶
getRuntimeStatistics: Statistics
Statistics after stage materialization
See:
Used when:
AQEPropagateEmptyRelationlogical optimization is requested for an estimated row countQueryStageExecis requested to compute statistics
Query Stage ID¶
id: Int
Used when:
- CoalesceShufflePartitions adaptive physical optimization is executed
New ShuffleQueryStageExec Instance for Reuse¶
newReuseInstance(
newStageId: Int,
newOutput: Seq[Attribute]): QueryStageExec
Used when:
AdaptiveSparkPlanExecphysical operator is requested to reuseQueryStage
Physical Query Plan¶
plan: SparkPlan
The sub-tree of the main query plan of this query stage (that acts like a child operator, but QueryStageExec is a LeafExecNode and has no children)
Implementations¶
Result¶
_resultOption: AtomicReference[Option[Any]]
QueryStageExec uses a _resultOption transient volatile internal variable (of type AtomicReference) for the result of a successful materialization of this QueryStageExec operator (when preparing for query execution):
- Broadcast variable (broadcasting data) for BroadcastQueryStageExec
- MapOutputStatistics (submitting map stages) for ShuffleQueryStageExec
As AtomicReference is mutable that is enough to update the value.
_resultOption is set when AdaptiveSparkPlanExec physical operator is requested for the final physical plan.
_resultOption is available using resultOption.
resultOption¶
resultOption: AtomicReference[Option[Any]]
resultOption returns the current value of the _resultOption registry.
resultOption is used when:
AdaptiveSparkPlanExecis requested to getFinalPhysicalPlan (to set the value)QueryStageExecis requested to isMaterializedShuffleQueryStageExecis requested for the MapOutputStatistics
Computing Runtime Statistics¶
computeStats(): Option[Statistics]
Only when this QueryStageExec has been materialized, computeStats gives a new Statistics based on the runtime statistics (and flips the isRuntime flag to true).
Otherwise, computeStats returns no statistics (None).
computeStats is used when:
LogicalQueryStagelogical operator is requested for the Statistics
isMaterialized¶
isMaterialized: Boolean
isMaterialized checks whether or not the resultOption has a value.
isMaterialized is used when:
AdaptiveSparkPlanExecis requested to createQueryStagesAQEPropagateEmptyRelationlogical optimization is requested for an estimated row count and isRelationWithAllNullKeysDynamicJoinSelectionlogical optimization is requested to selectJoinStrategyShuffleStageis requested to extract a materializedShuffleQueryStageExec(for OptimizeSkewedJoin physical optimization)QueryStageExecis requested to computeStats
Materializing Query Stage¶
materialize(): Future[Any]
materialize prints out the following DEBUG message to the logs (with the id):
Materialize query stage [simpleName]: [id]
materialize doMaterialize.
Final Method
materialize is a Scala final method and may not be overridden in subclasses.
Learn more in the Scala Language Specification.
materialize is used when:
AdaptiveSparkPlanExecphysical operator is requested to getFinalPhysicalPlan
Text Representation¶
generateTreeString(
depth: Int,
lastChildren: Seq[Boolean],
append: String => Unit,
verbose: Boolean,
prefix: String = "",
addSuffix: Boolean = false,
maxFields: Int,
printNodeId: Boolean,
indent: Int = 0): Unit
generateTreeString is part of the TreeNode abstraction.
generateTreeString generateTreeString (the default) followed by another generateTreeString (with the depth incremented).
Logging¶
QueryStageExec is an abstract class and logging is configured using the logger of the implementations.