Skip to content

TreeNode — Node in Catalyst Tree

TreeNode is an abstraction of named nodes in Catalyst with zero, one or more children.

Contract

children

children: Seq[BaseType]

Zero, one or more child nodes of the node

simpleStringWithNodeId

simpleStringWithNodeId(): String

One-line description of this node with the node identifier

Used when:

verboseString

verboseString(
  maxFields: Int): String

One-line verbose description

Used when TreeNode is requested to verboseStringWithSuffix and generateTreeString (with verbose flag enabled)

Implementations

Simple Description

simpleString: String

simpleString gives a simple one-line description of a TreeNode.

Internally, simpleString is the <> followed by <> separated by a single white space.

simpleString is used when TreeNode is requested for <> (of child nodes) and <> (with verbose flag off).

Numbered String Representation

numberedTreeString: String

numberedTreeString adds numbers to the string representation of this node tree.


numberedTreeString is used primarily for interactive debugging (using apply and p methods).

Getting n-th TreeNode in Tree (for Interactive Debugging)

apply(
  number: Int): TreeNode[_]

apply gives number-th tree node in a tree.

apply can be used for interactive debugging.

Internally, apply <> at number position or null.

Getting n-th BaseType in Tree (for Interactive Debugging)

p(
  number: Int): BaseType

p gives number-th tree node in a tree as BaseType for interactive debugging.

Note

p can be used for interactive debugging.

BaseType is the base type of a tree and in Spark SQL can be:

String Representation

toString: String

toString is part of Java's java.lang.Object for the string representation of an object, e.g. TreeNode.


toString is a synonym of treeString.

String Representation of All Nodes in Tree

treeString: String // (1)
treeString(
  verbose: Boolean,
  addSuffix: Boolean = false,
  maxFields: Int = SQLConf.get.maxToStringFields,
  printOperatorId: Boolean = false): String
treeString(
  append: String => Unit,
  verbose: Boolean,
  addSuffix: Boolean,
  maxFields: Int,
  printOperatorId: Boolean): Unit
  1. verbose flag is enabled (true)

printOperatorId

printOperatorId argument is false by default and seems turned on only when:

  • ExplainUtils utility is used to processPlanSkippingSubqueries

treeString returns the string representation of all the nodes in the TreeNode.


treeString is used when:

Demo

import org.apache.spark.sql.{functions => f}
val q = spark.range(10).withColumn("rand", f.rand())
val executedPlan = q.queryExecution.executedPlan

val output = executedPlan.treeString(verbose = true)

scala> println(output)
*(1) Project [id#0L, rand(6790207094253656854) AS rand#2]
+- *(1) Range (0, 10, step=1, splits=8)

Verbose Description with Suffix

verboseStringWithSuffix: String

verboseStringWithSuffix simply returns <>.

verboseStringWithSuffix is used when TreeNode is requested to <> (with verbose and addSuffix flags enabled).

Generating Text Representation

generateTreeString(
  depth: Int,
  lastChildren: Seq[Boolean],
  append: String => Unit,
  verbose: Boolean,
  prefix: String = "",
  addSuffix: Boolean = false,
  maxFields: Int,
  printNodeId: Boolean,
  indent: Int = 0): Unit

generateTreeString...FIXME


generateTreeString is used when:

Inner Child Nodes

innerChildren: Seq[TreeNode[_]]

innerChildren returns the inner nodes that should be shown as an inner nested tree of this node.

innerChildren simply returns an empty collection of TreeNodes.

innerChildren is used when TreeNode is requested to <>, <> and <>.

allChildren

allChildren: Set[TreeNode[_]]

NOTE: allChildren is a Scala lazy value which is computed once when accessed and cached afterwards.

allChildren...FIXME

allChildren is used when...FIXME

foreach

foreach(f: BaseType => Unit): Unit

foreach applies the input function f to itself (this) first and then (recursively) to the <>.

Node Name

nodeName: String

nodeName returns the name of the class with Exec suffix removed (that is used as a naming convention for the class name of physical operators).

nodeName is used when:

Scala Definition

abstract class TreeNode[BaseType <: TreeNode[BaseType]] extends Product {
  self: BaseType =>
  // ...
}

TreeNode is a recursive data structure that can have one or many <> that are again TreeNodes.

Tip

Read up on <: type operator in Scala in Upper Type Bounds.

Scala-specific, TreeNode is an abstract class that is the <> of Catalyst <> and <> abstract classes.

TreeNode therefore allows for building entire trees of TreeNodes, e.g. generic <> with concrete <> and physical operators that both use <> (which are TreeNodes again).

NOTE: Spark SQL uses TreeNode for <> and <> that can further be used together to build more advanced trees, e.g. Catalyst expressions can have query plans as <>.

TreeNode can itself be a node in a tree or a collection of nodes, i.e. itself and the <> nodes. Not only does TreeNode come with the <> that you may have used in https://docs.scala-lang.org/overviews/collections/overview.html[Scala Collection API] (e.g. <>, <>, <>, <>, <>), but also specialized ones for more advanced tree manipulation, e.g. <>, <>, <>, <>, <>, <>, <>, <>, <>.

TreeNode abstract type is a fairly advanced Scala type definition (at least comparing to the other Scala types in Spark) so understanding its behaviour even outside Spark might be worthwhile by itself.

Node Patterns

TreeNodes can optionally define node patterns for faster query planning (offering a so-called tree traversal pruning as part of SPARK-35042).

Node Patterns are a new feature in Apache Spark 3.2.0.

nodePatterns

nodePatterns: Seq[TreePattern]

nodePatterns is a collection of TreePatterns.

nodePatterns is empty by default (and is supposed to be overriden by the implementations).

Tree Pattern Bits

treePatternBits: BitSet

treePatternBits is the default tree pattern bits.

Lazy Value

treePatternBits is a Scala lazy value to guarantee that the code to initialize it is executed once only (when accessed for the first time) and the computed value never changes afterwards.

treePatternBits is part of the TreePatternBits abstraction.

Default Tree Pattern Bits

getDefaultTreePatternBits: BitSet

getDefaultTreePatternBits is a BitSet with the nodePatterns bits on (true) unioned with the treePatternBits of the children (if any).

getDefaultTreePatternBits is used when:

Tags

tags: Map[TreeNodeTag[_], Any]

TreeNode can have a metadata assigned (as a mutable map of tags and their values).

tags can be set, unset and looked up.

tags are copied (from another TreeNode) only when a TreeNode has none defined.

Copying Tags

copyTagsFrom(
  other: BaseType): Unit

copyTagsFrom is used when:

Looking Up Tag

getTagValue[T](
  tag: TreeNodeTag[T]): Option[T]

Setting Tag

setTagValue[T](
  tag: TreeNodeTag[T], value: T): Unit

Unsetting Tag

unsetTagValue[T](
  tag: TreeNodeTag[T]): Unit

unsetTagValue is used when:

  • ExplainUtils utility is used to removeTags
  • AdaptiveSparkPlanExec leaf physical operator is requested to cleanUpTempTags

Node Arguments (Comma-Separated Text)

argString(
  maxFields: Int): String

argString concatenates node arguments (based on the stringArgs).

argString is used when:

String Arguments

stringArgs: Iterator[Any]

stringArgs returns all the elements of this node (using Scala's Product.productIterator by default).


stringArgs is used when: