Catalyst Tree Manipulation Framework¶
Catalyst is an execution-agnostic framework to represent and manipulate a dataflow graph as trees of relational operators and expressions.
The Catalyst framework was introduced in [SPARK-1251] Support for optimizing and executing structured queries.
Spark SQL uses the Catalyst framework to build an extensible Optimizer with a number of built-in logical query plan optimizations.
Catalyst supports both rule-based and cost-based optimizations.