Skip to content

SparkSqlParser — Default SQL Parser

SparkSqlParser is a SQL parser to extract Catalyst expressions, plans, table identifiers from SQL texts using SparkSqlAstBuilder (as AstBuilder).

SparkSqlParser is the initial SQL parser in a SparkSession.

SparkSqlParser supports variable substitution.

SparkSqlParser is used to parse table strings into their corresponding table identifiers in the following:

Creating Instance

SparkSqlParser takes the following to be created:

SparkSqlParser is created when:

  • BaseSessionStateBuilder is requested for a SQL parser

  • expr standard function is used

Parsing Command

parse[T](
  command: String)(
  toResult: SqlBaseParser => T): T

parse is part of the AbstractSqlParser abstraction.


Note

The only reason for overriding parse method is to allow for VariableSubstitution to substitute variables.

parse requests the VariableSubstitution to substitute variables before requesting the default (parent) parser to parse the command.

SparkSqlAstBuilder

SparkSqlParser uses SparkSqlAstBuilder (as AstBuilder).

Accessing SparkSqlParser

SparkSqlParser is available as SessionState.sqlParser (unless...FIXME(note)).

import org.apache.spark.sql.SparkSession
assert(spark.isInstanceOf[SparkSession])

import org.apache.spark.sql.catalyst.parser.ParserInterface
val p = spark.sessionState.sqlParser
assert(p.isInstanceOf[ParserInterface])

import org.apache.spark.sql.execution.SparkSqlParser
assert(spark.sessionState.sqlParser.isInstanceOf[SparkSqlParser])

Translating SQL Statements to Logical Operators

SparkSqlParser is used in SparkSession.sql to translate a SQL text to a logical operator.

Translating SQL Statements to Column API

SparkSqlParser is used to translate an expression to the corresponding Column in the following:

scala> expr("token = 'hello'")
16/07/07 18:32:53 INFO SparkSqlParser: Parsing command: token = 'hello'
res0: org.apache.spark.sql.Column = (token = hello)

Variable Substitution

SparkSqlParser creates a VariableSubstitution when created.

The VariableSubstitution is used while parsing a SQL command.

Logging

Enable ALL logging level for org.apache.spark.sql.execution.SparkSqlParser logger to see what happens inside.

Add the following line to conf/log4j2.properties:

log4j.logger.org.apache.spark.sql.execution.SparkSqlParser=ALL

Refer to Logging.