Logical Query Plan Analyzer¶
Analyzer (Spark Analyzer or Query Analyzer) is the logical query plan analyzer that validates and transforms an unresolved logical plan to an analyzed logical plan.
Analyzer is a RuleExecutor to transform logical operators (
Analyzer: Unresolved Logical Plan ==> Analyzed Logical Plan
Analyzer is used by
QueryExecution to resolve the managed
LogicalPlan (and, as a sort of follow-up, assert that a structured query has already been properly analyzed, i.e. no failed or unresolved or somehow broken logical plan operators and expressions exist).
extendedResolutionRules Extension Point¶
extendedResolutionRules: Seq[Rule[LogicalPlan]] = Nil
extendedResolutionRules is an extension point for additional logical evaluation rules for Resolution batch. The rules are added at the end of the
SessionState uses its own
Analyzer with custom extendedResolutionRules, postHocResolutionRules, and extendedCheckRules extension methods.
postHocResolutionRules Extension Point¶
postHocResolutionRules: Seq[Rule[LogicalPlan]] = Nil
postHocResolutionRules is an extension point for rules in Post-Hoc Resolution batch if defined (that are executed in one pass, i.e.
Simple Sanity Check¶
- Type Coercion Rules
Normalize Alter Table¶
Remove Unresolved Hints¶
Analyzer takes the following to be created:
- Maximum number of iterations (of the FixedPoint rule batches)
Analyzer is created when
SessionState is requested for the analyzer.
Analyzer is available as the analyzer property of
scala> :type spark org.apache.spark.sql.SparkSession scala> :type spark.sessionState.analyzer org.apache.spark.sql.catalyst.analysis.Analyzer
You can access the analyzed logical plan of a structured query using Dataset.explain basic action (with
extended flag enabled) or SQL's
EXPLAIN EXTENDED SQL command.
// sample structured query val inventory = spark .range(5) .withColumn("new_column", 'id + 5 as "plus5") // Using explain operator (with extended flag enabled) scala> inventory.explain(extended = true) == Parsed Logical Plan == 'Project [id#0L, ('id + 5) AS plus5#2 AS new_column#3] +- AnalysisBarrier +- Range (0, 5, step=1, splits=Some(8)) == Analyzed Logical Plan == id: bigint, new_column: bigint Project [id#0L, (id#0L + cast(5 as bigint)) AS new_column#3L] +- Range (0, 5, step=1, splits=Some(8)) == Optimized Logical Plan == Project [id#0L, (id#0L + 5) AS new_column#3L] +- Range (0, 5, step=1, splits=Some(8)) == Physical Plan == *(1) Project [id#0L, (id#0L + 5) AS new_column#3L] +- *(1) Range (0, 5, step=1, splits=8)
Alternatively, you can access the analyzed logical plan using
QueryExecution and its analyzed property (that together with
numberedTreeString method is a very good "debugging" tool).
val analyzedPlan = inventory.queryExecution.analyzed scala> println(analyzedPlan.numberedTreeString) 00 Project [id#0L, (id#0L + cast(5 as bigint)) AS new_column#3L] 01 +- Range (0, 5, step=1, splits=Some(8))
FixedPoint with maxIterations for Hints, Substitution, Resolution and Cleanup batches.
expandRelationName( nameParts: Seq[String]): Seq[String]
expandRelationName is used when
ResolveTables and ResolveRelations logical analysis rules are executed.
ALL logging level for the respective session-specific loggers to see what happens inside
org.apache.spark.sql.hive.HiveSessionStateBuilder$$anon$1for Hive support
Add the following line to
# with no Hive support log4j.logger.org.apache.spark.sql.internal.SessionState$$anon$1=ALL # with Hive support enabled log4j.logger.org.apache.spark.sql.hive.HiveSessionStateBuilder$$anon$1=ALL
The reason for such weird-looking logger names is that
analyzer attribute is created as an anonymous subclass of
Analyzer class in the respective
Refer to Logging.