TreeNode — Node in Catalyst Tree¶
TreeNode is an abstraction of named nodes in Catalyst with zero, one or more children.
Contract¶
children¶
children: Seq[BaseType]
Zero, one or more child nodes of the node
simpleStringWithNodeId¶
simpleStringWithNodeId(): String
One-line description of this node with the node identifier
Used when:
TreeNodeis requested to generateTreeString (with node ID)
verboseString¶
verboseString(
maxFields: Int): String
One-line verbose description
Used when TreeNode is requested to verboseStringWithSuffix and generateTreeString (with verbose flag enabled)
Implementations¶
- Block
- Expression
- QueryPlan
Simple Description¶
simpleString: String
simpleString gives a simple one-line description of a TreeNode.
Internally, simpleString is the <
simpleString is used when TreeNode is requested for <verbose flag off).
Numbered String Representation¶
numberedTreeString: String
numberedTreeString adds numbers to the string representation of this node tree.
numberedTreeString is used primarily for interactive debugging (using apply and p methods).
Getting n-th TreeNode in Tree (for Interactive Debugging)¶
apply(
number: Int): TreeNode[_]
apply gives number-th tree node in a tree.
apply can be used for interactive debugging.
Internally, apply <number position or null.
Getting n-th BaseType in Tree (for Interactive Debugging)¶
p(
number: Int): BaseType
p gives number-th tree node in a tree as BaseType for interactive debugging.
Note
p can be used for interactive debugging.
BaseType is the base type of a tree and in Spark SQL can be:
-
LogicalPlan for logical plan trees
-
SparkPlan for physical plan trees
-
Expression for expression trees
String Representation¶
toString: String
toString is part of Java's java.lang.Object for the string representation of an object, e.g. TreeNode.
toString is a synonym of treeString.
String Representation of All Nodes in Tree¶
treeString: String // (1)
treeString(
verbose: Boolean,
addSuffix: Boolean = false,
maxFields: Int = SQLConf.get.maxToStringFields,
printOperatorId: Boolean = false): String
treeString(
append: String => Unit,
verbose: Boolean,
addSuffix: Boolean,
maxFields: Int,
printOperatorId: Boolean): Unit
verboseflag is enabled (true)
printOperatorId
printOperatorId argument is false by default and seems turned on only when:
ExplainUtilsutility is used toprocessPlanSkippingSubqueries
treeString returns the string representation of all the nodes in the TreeNode.
treeString is used when:
QueryPlanis requested to appendTreeNodeis requested for a string representation and numbered string representation
Demo¶
import org.apache.spark.sql.{functions => f}
val q = spark.range(10).withColumn("rand", f.rand())
val executedPlan = q.queryExecution.executedPlan
val output = executedPlan.treeString(verbose = true)
scala> println(output)
*(1) Project [id#0L, rand(6790207094253656854) AS rand#2]
+- *(1) Range (0, 10, step=1, splits=8)
Verbose Description with Suffix¶
verboseStringWithSuffix: String
verboseStringWithSuffix simply returns <
verboseStringWithSuffix is used when TreeNode is requested to <verbose and addSuffix flags enabled).
Generating Text Representation¶
generateTreeString(
depth: Int,
lastChildren: Seq[Boolean],
append: String => Unit,
verbose: Boolean,
prefix: String = "",
addSuffix: Boolean = false,
maxFields: Int,
printNodeId: Boolean,
indent: Int = 0): Unit
generateTreeString...FIXME
generateTreeString is used when:
TreeNodeis requested for the text representation of all nodes in the tree
Inner Child Nodes¶
innerChildren: Seq[TreeNode[_]]
innerChildren returns the inner nodes that should be shown as an inner nested tree of this node.
innerChildren simply returns an empty collection of TreeNodes.
innerChildren is used when TreeNode is requested to <
allChildren¶
allChildren: Set[TreeNode[_]]
NOTE: allChildren is a Scala lazy value which is computed once when accessed and cached afterwards.
allChildren...FIXME
allChildren is used when...FIXME
foreach¶
foreach(f: BaseType => Unit): Unit
foreach applies the input function f to itself (this) first and then (recursively) to the <
Node Name¶
nodeName: String
nodeName returns the name of the class with Exec suffix removed (that is used as a naming convention for the class name of physical operators).
nodeName is used when:
TreeNodeis requested for simpleString and asCode
Scala Definition¶
abstract class TreeNode[BaseType <: TreeNode[BaseType]] extends Product {
self: BaseType =>
// ...
}
TreeNode is a recursive data structure that can have one or many <TreeNodes.
Tip
Read up on <: type operator in Scala in Upper Type Bounds.
Scala-specific, TreeNode is an abstract class that is the <
TreeNode therefore allows for building entire trees of TreeNodes, e.g. generic <TreeNodes again).
NOTE: Spark SQL uses TreeNode for <
TreeNode can itself be a node in a tree or a collection of nodes, i.e. itself and the <TreeNode come with the <