Skip to content

Parameterized Queries

Parameterized Queries (Parameterized SQL) allows Spark SQL developers to write SQL statements with parameter markers to be bound at execution time with parameters (literals) by name or position.

Parameterized Queries are supposed to improve security and reusability, and help preventing SQL injection attacks for applications that generate SQL at runtime (e.g., based on a user's selections, which is often done via a user interface).

Parameterized Queries supports named and positional parameters. SQL parser can recognize them using the following:

  • : (colon) followed by name for named parameters
  • ? (question mark) for positional parameters
WITH a AS (SELECT 1 c)
SELECT *
FROM a
LIMIT :limitA
WITH a AS (SELECT 1 c)
SELECT *
FROM a
LIMIT ?

Parameterized Queries are executed using SparkSession.sql operator (marked as experimental).

sql(
  sqlText: String,
  args: Map[String, Any]): DataFrame

Parameterized Queries feature was introduced in [SPARK-41271] Parameterized SQL.

Internals