Skip to content

Default Columns

Apache Spark 3.4 introduces support for DEFAULT columns in the following SQL statements:

  • ALTER TABLE ADD (COLUMN | COLUMNS)
  • ALTER TABLE REPLACE COLUMNS
  • CREATE TABLE
  • (CREATE OR)? REPLACE TABLE

Default Columns are columns defined with DEFAULT clause.

defaultExpression
    : DEFAULT expression
    ;

colDefinitionOption
    : NOT NULL
    | defaultExpression
    | generationExpression
    | commentSpec
    ;

createOrReplaceTableColType
    : colName dataType colDefinitionOption*
    ;

qualifiedColTypeWithPosition
    : multipartIdentifier dataType (NOT NULL)? defaultExpression? commentSpec? colPosition?
    ;

Default Columns are enabled using spark.sql.defaultColumn.enabled configuration property.

With DEFAULT columns, INSERT, UPDATE, MERGE statements can reference the value using the DEFAULT keyword.

CREATE TABLE T(a INT, b INT NOT NULL);

-- The default default is NULL
INSERT INTO T VALUES (DEFAULT, 0);
INSERT INTO T(b) VALUES (1);
SELECT * FROM T;
-- (NULL, 0)
-- (NULL, 1)

-- Adding a default to a table with rows, sets the values for the
-- existing rows (exist default) and new rows (current default).
ALTER TABLE T ADD COLUMN c INT DEFAULT 5;
INSERT INTO T VALUES (1, 2, DEFAULT);
SELECT * FROM T;
-- (NULL, 0, 5)
-- (NULL, 1, 5)
-- (1, 2, 5)
CREATE TABLE t1 (
  i BOOLEAN,
  s BIGINT DEFAULT 42)
USING parquet;

INSERT INTO t1
VALUES
  (true, 41),
  (false, DEFAULT);

CREATE TABLE T2 (
  i BOOLEAN DEFAULT true,
  s BIGINT DEFAULT 42,
  t STRING DEFAULT 'ABC')
USING parquet;

INSERT INTO t2 (i, s)
  SELECT DEFAULT, s
  FROM t1
  ORDER BY s
  LIMIT 1
  OFFSET 1;

Default Columns uses the following configuration properties:

Default Columns are resolved using ResolveDefaultColumns logical resolution rule.

Column Metadata Attributes

Default Columns feature uses the internal column metadata attributes to mark schema fields with default values.

CURRENT_DEFAULT

EXISTS_DEFAULT