QueryPlan — Structured Query Plan¶
QueryPlan
is an extension of the TreeNode abstraction for query plans in Catalyst Framework.
QueryPlan
is used to build a tree of relational operators of a structured query. QueryPlan
is a tree of (logical or physical) operators that have a tree of expressions.
QueryPlan
has an output attributes, expressions and a schema.
QueryPlan
has statePrefix that is used when displaying a plan with !
to indicate an invalid plan, and '
to indicate an unresolved plan.
A QueryPlan
is invalid if there are missing input attributes and children
subnodes are non-empty.
A QueryPlan
is unresolved if the column names have not been verified and column types have not been looked up in the Catalog.
Contract¶
Output (Schema) Attributes¶
output: Seq[Attribute]
Output Attributes
val q = spark.range(3)
scala> q.queryExecution.analyzed.output
res0: Seq[org.apache.spark.sql.catalyst.expressions.Attribute] = List(id#0L)
scala> q.queryExecution.withCachedData.output
res1: Seq[org.apache.spark.sql.catalyst.expressions.Attribute] = List(id#0L)
scala> q.queryExecution.optimizedPlan.output
res2: Seq[org.apache.spark.sql.catalyst.expressions.Attribute] = List(id#0L)
scala> q.queryExecution.sparkPlan.output
res3: Seq[org.apache.spark.sql.catalyst.expressions.Attribute] = List(id#0L)
scala> q.queryExecution.executedPlan.output
res4: Seq[org.apache.spark.sql.catalyst.expressions.Attribute] = List(id#0L)
Tip
You can build a StructType from output
attributes using toStructType.
scala> q.queryExecution.analyzed.output.toStructType
res5: org.apache.spark.sql.types.StructType = StructType(StructField(id,LongType,false))
Implementations¶
- AnalysisHelper
- LogicalPlan
- SparkPlan
Expressions¶
expressions: Seq[Expression]
expressions
is all of the expressions present in this query plan operator.
Expression References¶
references: AttributeSet
Lazy Value
references
is a Scala lazy value to guarantee that the code to initialize it is executed once only (when accessed for the first time) and the computed value never changes afterwards.
Learn more in the Scala Language Specification.
references
is an AttributeSet
of all the Attributes that are referenced by the expressions of this operator (except the produced attributes).
references
is used when:
QueryPlan
is requested for the missing input attributes, to transformUpWithNewOutputCodegenSupport
is requested for the used input attributes- others (less interesting?)
Transforming Expressions¶
transformExpressions(
rule: PartialFunction[Expression, Expression]): this.type
transformExpressions
executes transformExpressionsDown with the input rule.
transformExpressions
is used when...FIXME
Transforming Expressions (Down The Tree)¶
transformExpressionsDown(
rule: PartialFunction[Expression, Expression]): this.type
transformExpressionsDown
applies the given rule to each expression in the query operator.
transformExpressionsDown
is used when...FIXME
Output Schema Attribute Set¶
outputSet: AttributeSet
outputSet
simply returns an AttributeSet
for the output attributes.
outputSet
is used when...FIXME
Missing Input Attributes¶
missingInput: AttributeSet
missingInput
are attributes that are referenced in expressions but not provided by this node's children (as inputSet
) and are not produced by this node (as producedAttributes
).
Output Schema¶
You can request the schema of a QueryPlan
using schema
that builds StructType from the output attributes.
// the query
val dataset = spark.range(3)
scala> dataset.queryExecution.analyzed.schema
res6: org.apache.spark.sql.types.StructType = StructType(StructField(id,LongType,false))
Simple (Basic) Description with State Prefix¶
simpleString: String
simpleString
adds a state prefix to the node's simple text description.
simpleString
is part of the TreeNode abstraction.
State Prefix¶
statePrefix: String
Internally, statePrefix
gives !
(exclamation mark) when the node is invalid, i.e. missingInput is not empty, and the node is a parent node. Otherwise, statePrefix
gives an empty string.
statePrefix
is used when QueryPlan
is requested for the simple text node description.
Simple (Basic) Description with State Prefix¶
verboseString: String
verboseString
simply returns the simple (basic) description with state prefix.
verboseString
is part of the TreeNode abstraction.
innerChildren¶
innerChildren: Seq[QueryPlan[_]]
innerChildren
simply returns the subqueries.
innerChildren
is part of the TreeNode abstraction.
subqueries¶
subqueries: Seq[PlanType]
subqueries
...FIXME
subqueries
is used when...FIXME
simpleStringWithNodeId¶
simpleStringWithNodeId(): String
simpleStringWithNodeId
is part of the TreeNode abstraction.
simpleStringWithNodeId
finds the operatorId tag or defaults to unknown
.
simpleStringWithNodeId
uses the nodeName to return the following text:
[nodeName] ([operatorId])
append¶
append[T <: QueryPlan[T]](
plan: => QueryPlan[T],
append: String => Unit,
verbose: Boolean,
addSuffix: Boolean,
maxFields: Int = SQLConf.get.maxToStringFields,
printOperatorId: Boolean = false): Unit
append
...FIXME
append
is used when:
QueryExecution
is requested to simpleString, writePlans and stringWithStatsExplainUtils
utility is requested toprocessPlanSkippingSubqueries
Detailed Description (with Operator Id)¶
verboseStringWithOperatorId(): String
verboseStringWithOperatorId
returns the following text (with spark.sql.debug.maxToStringFields configuration property for the number of arguments to this node, if there are any, and the formatted node name):
[formattedNodeName]
Arguments: [argumentString]
verboseStringWithOperatorId
is used when:
QueryExecution
is requested for simple description (andExplainUtils
utility is requested toprocessPlanSkippingSubqueries
)
Formatted Node Name¶
formattedNodeName: String
formattedNodeName
...FIXME
formattedNodeName
is used when:
QueryPlan
is requested for verboseStringWithOperatorId
transformAllExpressionsWithPruning¶
transformAllExpressionsWithPruning(
cond: TreePatternBits => Boolean,
ruleId: RuleId = UnknownRuleId)(
rule: PartialFunction[Expression, Expression]): this.type
transformAllExpressionsWithPruning
...FIXME
transformAllExpressionsWithPruning
is used when:
QueryPlan
is requested for transformAllExpressions and normalizeExpressionsAnalysisHelper
is requested totransformAllExpressionsWithPruning
PlanSubqueries
physical optimization is executedPlanDynamicPruningFilters
physical optimization is executedPlanAdaptiveDynamicPruningFilters
physical optimization is executedPlanAdaptiveSubqueries
physical optimization is executedReuseAdaptiveSubquery
physical optimization is executed
Produced Attributes¶
producedAttributes: AttributeSet
producedAttributes
is empty (and can be overriden by implementations).
producedAttributes
is used when:
NestedColumnAliasing
is requested tounapply
(destructure a logical operator)QueryPlan
is requested for the references
Output Data Ordering Requirements¶
outputOrdering: Seq[SortOrder]
outputOrdering
specifies the Output Data Ordering Requirements of this operator (as SortOrders):
- For logical operators it is global ordering of the data
- For physical operators it is ordering in each partition
outputOrdering
defaults to no ordering (Nil
).
outputOrdering
is used when:
- FIXME