Skip to content

SQL Parsing Framework

Spark SQL supports SQL language using SQL Parser Framework.

SQL Parser Framework translates SQL statements to corresponding relational entities using ANTLR1.

What is ANTLR?

ANTLR (ANother Tool for Language Recognition) is a parser generator used to build languages, tools, and frameworks. From a grammar, ANTLR generates a parser that can build and walk parse trees.

The main abstraction is ParserInterface that is extended by AbstractSqlParser so SQL parsers can focus on a custom AstBuilder only.

There are two concrete AbstractSqlParsers:

  1. SparkSqlParser that is the default parser of the SQL expressions into Spark SQL types.
  2. CatalystSqlParser that is used to parse data types from their canonical string representation.

Example

Let's take a look at MERGE INTO SQL statement to deep dive into how Spark SQL handles this and other SQL statements.

MERGE INTO and UPDATE SQL Statements Not Supported

Partial support for MERGE INTO went into Apache Spark 3.0.0 (as part of SPARK-28893).

It is not finished yet since BasicOperators execution planning strategy throws an UnsupportedOperationException for MERGE INTO and UPDATE SQL statements.

MERGE INTO is described in SqlBaseParser.g4 grammar (in #mergeIntoTable labeled alternative).

AstBuilder translates a MERGE INTO SQL query into a MergeIntoTable logical command.

ResolveReferences logical resolution rule is used to resolve references of MergeIntoTables (for a merge condition and matched and not-matched actions).

In the end, BasicOperators execution planning strategy throws an UnsupportedOperationException:

MERGE INTO TABLE is not supported temporarily.