SparkSqlParser — Default SQL Parser¶
SparkSqlParser
is a SQL parser to extract Catalyst expressions, plans, table identifiers from SQL texts using SparkSqlAstBuilder (as AstBuilder).
SparkSqlParser
is the initial SQL parser in a SparkSession
.
SparkSqlParser
supports variable substitution.
SparkSqlParser
is used to parse table strings into their corresponding table identifiers in the following:
table
methods in DataFrameReader and SparkSession- insertInto and saveAsTable methods of
DataFrameWriter
createExternalTable
andrefreshTable
methods of Catalog (and SessionState)
Creating Instance¶
SparkSqlParser
takes the following to be created:
SparkSqlParser
is created when:
-
BaseSessionStateBuilder
is requested for a SQL parser -
expr standard function is used
Parsing Command¶
parse[T](
command: String)(
toResult: SqlBaseParser => T): T
parse
is part of the AbstractSqlParser abstraction.
Note
The only reason for overriding parse
method is to allow for VariableSubstitution to substitute variables.
parse
requests the VariableSubstitution to substitute variables before requesting the default (parent) parser to parse the command.
SparkSqlAstBuilder¶
SparkSqlParser
uses SparkSqlAstBuilder (as AstBuilder).
Accessing SparkSqlParser¶
SparkSqlParser
is available as SessionState.sqlParser (unless...FIXME(note)).
import org.apache.spark.sql.SparkSession
assert(spark.isInstanceOf[SparkSession])
import org.apache.spark.sql.catalyst.parser.ParserInterface
val p = spark.sessionState.sqlParser
assert(p.isInstanceOf[ParserInterface])
import org.apache.spark.sql.execution.SparkSqlParser
assert(spark.sessionState.sqlParser.isInstanceOf[SparkSqlParser])
Translating SQL Statements to Logical Operators¶
SparkSqlParser
is used in SparkSession.sql to translate a SQL text to a logical operator.
Translating SQL Statements to Column API¶
SparkSqlParser
is used to translate an expression to the corresponding Column in the following:
- expr standard function
- Dataset operators: selectExpr, filter, where
scala> expr("token = 'hello'")
16/07/07 18:32:53 INFO SparkSqlParser: Parsing command: token = 'hello'
res0: org.apache.spark.sql.Column = (token = hello)
Variable Substitution¶
SparkSqlParser
creates a VariableSubstitution when created.
The VariableSubstitution
is used while parsing a SQL command.
Logging¶
Enable ALL
logging level for org.apache.spark.sql.execution.SparkSqlParser
logger to see what happens inside.
Add the following line to conf/log4j2.properties
:
log4j.logger.org.apache.spark.sql.execution.SparkSqlParser=ALL
Refer to Logging.