CatalystSqlParser¶

CatalystSqlParser is an AbstractSqlParser for DataTypes.

CatalystSqlParser uses AstBuilder for parsing SQL texts.

import org.apache.spark.sql.catalyst.parser.CatalystSqlParser
import org.apache.spark.sql.internal.SQLConf
val catalystSqlParser = new CatalystSqlParser(SQLConf.get)
scala> :type catalystSqlParser.astBuilder
org.apache.spark.sql.catalyst.parser.AstBuilder

CatalystSqlParser is used to translate DataTypes from their canonical string representation (e.g. when adding fields to a schema or casting column to a different data type) or StructTypes.

import org.apache.spark.sql.types.StructType
scala> val struct = new StructType().add("a", "int")
struct: org.apache.spark.sql.types.StructType = StructType(StructField(a,IntegerType,true))

scala> val asInt = expr("token = 'hello'").cast("int")
asInt: org.apache.spark.sql.Column = CAST((token = hello) AS INT)

When parsing, you should see INFO messages in the logs:

Parsing command: int

It is also used in HiveClientImpl (when converting columns from Hive to Spark) and in OrcFileOperator (when inferring the schema for ORC files).

Creating Instance¶

CatalystSqlParser takes the following to be created:

SQLConf

CatalystSqlParser is created when:

SessionCatalog is created

Logging¶

Enable ALL logging level for org.apache.spark.sql.catalyst.parser.CatalystSqlParser logger to see what happens inside.

Add the following line to conf/log4j2.properties:

log4j.logger.org.apache.spark.sql.catalyst.parser.CatalystSqlParser=ALL

Refer to Logging.