SQL¶

Spark Declarative Pipelines supports the following SQL statements to define data processing pipelines:

CREATE FLOW AS INSERT INTO BY NAME (Spark SQL)
CREATE MATERIALIZED VIEW AS (Spark SQL)
CREATE STREAMING TABLE (Spark SQL)
CREATE STREAMING TABLE AS (Spark SQL)
CREATE VIEW (Spark SQL)
CREATE TEMPORARY VIEW (Spark SQL)
SET (Spark SQL)
SET CATALOG (Spark SQL)
USE NAMESPACE (Spark SQL)

Pipelines elements are defined in files with .sql file extension.

The SQL files are included as libraries in a pipelines specification file.

SqlGraphRegistrationContext is used on Spark Connect Server to handle SQL statements (from SQL definitions files and Python decorators).

A streaming table can be defined without a query, as streaming tables' data can be backed by standalone flows. During a pipeline execution, it is validated that a streaming table has at least one standalone flow writing to the table, if no query is specified in the create statement itself.