QueryPlan — Structured Query Plan¶

QueryPlan is an extension of the TreeNode abstraction for query plans in Catalyst Framework.

QueryPlan is used to build a tree of relational operators of a structured query. QueryPlan is a tree of (logical or physical) operators that have a tree of expressions.

QueryPlan has an output attributes, expressions and a schema.

QueryPlan has statePrefix that is used when displaying a plan with ! to indicate an invalid plan, and ' to indicate an unresolved plan.

A QueryPlan is invalid if there are missing input attributes and children subnodes are non-empty.

A QueryPlan is unresolved if the column names have not been verified and column types have not been looked up in the Catalog.

Contract¶

Output (Schema) Attributes¶

output: Seq[Attribute]

Output Attributes

val q = spark.range(3)

scala> q.queryExecution.analyzed.output
res0: Seq[org.apache.spark.sql.catalyst.expressions.Attribute] = List(id#0L)

scala> q.queryExecution.withCachedData.output
res1: Seq[org.apache.spark.sql.catalyst.expressions.Attribute] = List(id#0L)

scala> q.queryExecution.optimizedPlan.output
res2: Seq[org.apache.spark.sql.catalyst.expressions.Attribute] = List(id#0L)

scala> q.queryExecution.sparkPlan.output
res3: Seq[org.apache.spark.sql.catalyst.expressions.Attribute] = List(id#0L)

scala> q.queryExecution.executedPlan.output
res4: Seq[org.apache.spark.sql.catalyst.expressions.Attribute] = List(id#0L)

Tip

You can build a StructType from output attributes using toStructType.

scala> q.queryExecution.analyzed.output.toStructType
res5: org.apache.spark.sql.types.StructType = StructType(StructField(id,LongType,false))

Implementations¶

AnalysisHelper
LogicalPlan
SparkPlan

Expressions¶

expressions: Seq[Expression]

expressions is all of the expressions present in this query plan operator.

Expression References¶

references: AttributeSet

Lazy Value

references is a Scala lazy value to guarantee that the code to initialize it is executed once only (when accessed for the first time) and the computed value never changes afterwards.

Learn more in the Scala Language Specification.

references is an AttributeSet of all the Attributes that are referenced by the expressions of this operator (except the produced attributes).

references is used when:

QueryPlan is requested for the missing input attributes, to transformUpWithNewOutput
CodegenSupport is requested for the used input attributes
others (less interesting?)

Transforming Expressions¶

transformExpressions(
  rule: PartialFunction[Expression, Expression]): this.type

transformExpressions executes transformExpressionsDown with the input rule.

transformExpressions is used when...FIXME

Transforming Expressions (Down The Tree)¶

transformExpressionsDown(
  rule: PartialFunction[Expression, Expression]): this.type

transformExpressionsDown applies the given rule to each expression in the query operator.

transformExpressionsDown is used when...FIXME

Output Schema Attribute Set¶

outputSet: AttributeSet

outputSet simply returns an AttributeSet for the output attributes.

outputSet is used when...FIXME

Missing Input Attributes¶

missingInput: AttributeSet

missingInput are attributes that are referenced in expressions but not provided by this node's children (as inputSet) and are not produced by this node (as producedAttributes).

Output Schema¶

You can request the schema of a QueryPlan using schema that builds StructType from the output attributes.

// the query
val dataset = spark.range(3)

scala> dataset.queryExecution.analyzed.schema
res6: org.apache.spark.sql.types.StructType = StructType(StructField(id,LongType,false))

Simple (Basic) Description with State Prefix¶

simpleString: String

simpleString adds a state prefix to the node's simple text description.

simpleString is part of the TreeNode abstraction.

State Prefix¶

statePrefix: String

Internally, statePrefix gives ! (exclamation mark) when the node is invalid, i.e. missingInput is not empty, and the node is a parent node. Otherwise, statePrefix gives an empty string.

statePrefix is used when QueryPlan is requested for the simple text node description.

Simple (Basic) Description with State Prefix¶

verboseString: String

verboseString simply returns the simple (basic) description with state prefix.

verboseString is part of the TreeNode abstraction.

innerChildren¶

innerChildren: Seq[QueryPlan[_]]

innerChildren simply returns the subqueries.

innerChildren is part of the TreeNode abstraction.

subqueries¶

subqueries: Seq[PlanType]

subqueries...FIXME

subqueries is used when...FIXME

simpleStringWithNodeId¶

simpleStringWithNodeId(): String

simpleStringWithNodeId is part of the TreeNode abstraction.

simpleStringWithNodeId finds the operatorId tag or defaults to unknown.

simpleStringWithNodeId uses the nodeName to return the following text:

[nodeName] ([operatorId])

append¶

append[T <: QueryPlan[T]](
  plan: => QueryPlan[T],
  append: String => Unit,
  verbose: Boolean,
  addSuffix: Boolean,
  maxFields: Int = SQLConf.get.maxToStringFields,
  printOperatorId: Boolean = false): Unit

append...FIXME

append is used when:

QueryExecution is requested to simpleString, writePlans and stringWithStats
ExplainUtils utility is requested to processPlanSkippingSubqueries

Detailed Description (with Operator Id)¶

verboseStringWithOperatorId(): String

verboseStringWithOperatorId returns the following text (with spark.sql.debug.maxToStringFields configuration property for the number of arguments to this node, if there are any, and the formatted node name):

[formattedNodeName]
Arguments: [argumentString]

verboseStringWithOperatorId is used when:

QueryExecution is requested for simple description (and ExplainUtils utility is requested to processPlanSkippingSubqueries)

Formatted Node Name¶

formattedNodeName: String

formattedNodeName...FIXME

formattedNodeName is used when:

QueryPlan is requested for verboseStringWithOperatorId

transformAllExpressionsWithPruning¶

transformAllExpressionsWithPruning(
  cond: TreePatternBits => Boolean,
  ruleId: RuleId = UnknownRuleId)(
  rule: PartialFunction[Expression, Expression]): this.type

transformAllExpressionsWithPruning...FIXME

transformAllExpressionsWithPruning is used when:

QueryPlan is requested for transformAllExpressions and normalizeExpressions
AnalysisHelper is requested to transformAllExpressionsWithPruning
PlanSubqueries physical optimization is executed
PlanDynamicPruningFilters physical optimization is executed
PlanAdaptiveDynamicPruningFilters physical optimization is executed
PlanAdaptiveSubqueries physical optimization is executed
ReuseAdaptiveSubquery physical optimization is executed

Produced Attributes¶

producedAttributes: AttributeSet

producedAttributes is empty (and can be overriden by implementations).

producedAttributes is used when:

NestedColumnAliasing is requested to unapply (destructure a logical operator)
QueryPlan is requested for the references

Output Data Ordering Requirements¶

outputOrdering: Seq[SortOrder]

outputOrdering specifies the Output Data Ordering Requirements of this operator (as SortOrders):

For logical operators it is global ordering of the data
For physical operators it is ordering in each partition

outputOrdering defaults to no ordering (Nil).

outputOrdering is used when:

FIXME