QueryStageExec Leaf Physical Operators¶
QueryStageExec
is an extension of the LeafExecNode abstraction for leaf physical operators for Adaptive Query Execution.
Contract¶
Cancelling¶
cancel(): Unit
Cancels the stage materialization if in progress; otherwise does nothing.
Used when:
AdaptiveSparkPlanExec
physical operator is requested to cleanUpAndThrowException
Materializing¶
doMaterialize(): Future[Any]
Used when:
QueryStageExec
is requested to materialize
Runtime Statistics¶
getRuntimeStatistics: Statistics
Statistics after stage materialization
See:
Used when:
AQEPropagateEmptyRelation
logical optimization is requested for an estimated row countQueryStageExec
is requested to compute statistics
Query Stage ID¶
id: Int
Used when:
- CoalesceShufflePartitions adaptive physical optimization is executed
New ShuffleQueryStageExec Instance for Reuse¶
newReuseInstance(
newStageId: Int,
newOutput: Seq[Attribute]): QueryStageExec
Used when:
AdaptiveSparkPlanExec
physical operator is requested to reuseQueryStage
Physical Query Plan¶
plan: SparkPlan
The sub-tree of the main query plan of this query stage (that acts like a child operator, but QueryStageExec
is a LeafExecNode and has no children)
Implementations¶
Result¶
_resultOption: AtomicReference[Option[Any]]
QueryStageExec
uses a _resultOption
transient volatile internal variable (of type AtomicReference) for the result of a successful materialization of this QueryStageExec
operator (when preparing for query execution):
- Broadcast variable (broadcasting data) for BroadcastQueryStageExec
- MapOutputStatistics (submitting map stages) for ShuffleQueryStageExec
As AtomicReference
is mutable that is enough to update the value.
_resultOption
is set when AdaptiveSparkPlanExec
physical operator is requested for the final physical plan.
_resultOption
is available using resultOption.
resultOption¶
resultOption: AtomicReference[Option[Any]]
resultOption
returns the current value of the _resultOption registry.
resultOption
is used when:
AdaptiveSparkPlanExec
is requested to getFinalPhysicalPlan (to set the value)QueryStageExec
is requested to isMaterializedShuffleQueryStageExec
is requested for the MapOutputStatistics
Computing Runtime Statistics¶
computeStats(): Option[Statistics]
Only when this QueryStageExec
has been materialized, computeStats
gives a new Statistics based on the runtime statistics (and flips the isRuntime flag to true
).
Otherwise, computeStats
returns no statistics (None
).
computeStats
is used when:
LogicalQueryStage
logical operator is requested for the Statistics
isMaterialized¶
isMaterialized: Boolean
isMaterialized
checks whether or not the resultOption has a value.
isMaterialized
is used when:
AdaptiveSparkPlanExec
is requested to createQueryStagesAQEPropagateEmptyRelation
logical optimization is requested for an estimated row count and isRelationWithAllNullKeysDynamicJoinSelection
logical optimization is requested to selectJoinStrategyShuffleStage
is requested to extract a materializedShuffleQueryStageExec
(for OptimizeSkewedJoin physical optimization)QueryStageExec
is requested to computeStats
Materializing Query Stage¶
materialize(): Future[Any]
materialize
prints out the following DEBUG message to the logs (with the id):
Materialize query stage [simpleName]: [id]
materialize
doMaterialize.
Final Method
materialize
is a Scala final method and may not be overridden in subclasses.
Learn more in the Scala Language Specification.
materialize
is used when:
AdaptiveSparkPlanExec
physical operator is requested to getFinalPhysicalPlan
Text Representation¶
generateTreeString(
depth: Int,
lastChildren: Seq[Boolean],
append: String => Unit,
verbose: Boolean,
prefix: String = "",
addSuffix: Boolean = false,
maxFields: Int,
printNodeId: Boolean,
indent: Int = 0): Unit
generateTreeString
is part of the TreeNode abstraction.
generateTreeString
generateTreeString (the default) followed by another generateTreeString (with the depth incremented).
Logging¶
QueryStageExec
is an abstract class and logging is configured using the logger of the implementations.