SparkSqlParser — Default SQL Parser¶
SparkSqlParser is a SQL parser to extract Catalyst expressions, plans, table identifiers from SQL texts using SparkSqlAstBuilder (as AstBuilder).
SparkSqlParser is the initial SQL parser in a SparkSession.
SparkSqlParser supports variable substitution.
SparkSqlParser is used to parse table strings into their corresponding table identifiers in the following:
tablemethods in DataFrameReader and SparkSession- insertInto and saveAsTable methods of
DataFrameWriter createExternalTableandrefreshTablemethods of Catalog (and SessionState)
Creating Instance¶
SparkSqlParser takes the following to be created:
SparkSqlParser is created when:
-
BaseSessionStateBuilderis requested for a SQL parser -
expr standard function is used
Parsing Command¶
parse[T](
command: String)(
toResult: SqlBaseParser => T): T
parse is part of the AbstractSqlParser abstraction.
Note
The only reason for overriding parse method is to allow for VariableSubstitution to substitute variables.
parse requests the VariableSubstitution to substitute variables before requesting the default (parent) parser to parse the command.
SparkSqlAstBuilder¶
SparkSqlParser uses SparkSqlAstBuilder (as AstBuilder).
Accessing SparkSqlParser¶
SparkSqlParser is available as SessionState.sqlParser (unless...FIXME(note)).
import org.apache.spark.sql.SparkSession
assert(spark.isInstanceOf[SparkSession])
import org.apache.spark.sql.catalyst.parser.ParserInterface
val p = spark.sessionState.sqlParser
assert(p.isInstanceOf[ParserInterface])
import org.apache.spark.sql.execution.SparkSqlParser
assert(spark.sessionState.sqlParser.isInstanceOf[SparkSqlParser])
Translating SQL Statements to Logical Operators¶
SparkSqlParser is used in SparkSession.sql to translate a SQL text to a logical operator.
Translating SQL Statements to Column API¶
SparkSqlParser is used to translate an expression to the corresponding Column in the following:
- expr standard function
- Dataset operators: selectExpr, filter, where
scala> expr("token = 'hello'")
16/07/07 18:32:53 INFO SparkSqlParser: Parsing command: token = 'hello'
res0: org.apache.spark.sql.Column = (token = hello)
Variable Substitution¶
SparkSqlParser creates a VariableSubstitution when created.
The VariableSubstitution is used while parsing a SQL command.
Logging¶
Enable ALL logging level for org.apache.spark.sql.execution.SparkSqlParser logger to see what happens inside.
Add the following line to conf/log4j2.properties:
log4j.logger.org.apache.spark.sql.execution.SparkSqlParser=ALL
Refer to Logging.