Skip to content

MERGE INTO Command

Delta Lake supports merging records into a delta table using the following high-level operators:

Merge command is executed as a transactional MergeIntoCommand.

While running a merge, MergeIntoCommand can choose between the available MergeOutputGenerations (ClassicMergeExecutor or InsertOnlyMergeExecutor) for an optimized output of the merge command to write out.

SQL MERGE (DML Statement)

Quoting Wikipedia:

MERGE (also called upsert) statements are used to simultaneously INSERT new records or UPDATE existing records depending on whether condition matches.

Merge command lets you transactionally execute multiple INSERT, UPDATE, and DELETE DML statements.

Insert-Only Merges

Insert-Only Merges are MERGE queries with WHEN NOT MATCHED clauses only (DeltaMergeIntoNotMatchedClauses precisely that can only be WHEN NOT MATCHED THEN INSERT clauses).

notMatchedClause
    : WHEN NOT MATCHED (BY TARGET)? (AND notMatchedCond)? THEN notMatchedAction
    ;

notMatchedAction
    : INSERT *
    | INSERT (columns) VALUES (expressions)
    ;

Delta Lake applies extra optimizations to insert-only merges only with spark.databricks.delta.merge.optimizeInsertOnlyMerge.enabled enabled.

For insert-only merges, merge becomes a writeOnlyInserts (using InsertOnlyMergeExecutor) (instead of writeAllChanges using ClassicMergeExecutor).

InsertOnlyMergeExecutor uses LEFT ANTI join to find the rows to insert (relying on Data Skipping along the way).

Single Insert-Only Merges

There is a special handling of single INSERT-only MERGEs.

MERGE INTO merge_demo to
USING merge_demo_source from
ON to.id = from.id
WHEN NOT MATCHED THEN INSERT *;

Schema Evolution (Auto Schema Merging)

Merge command supports schema evolution for star and non-star UPDATEs and INSERTs (with columns not in a target delta table) when schema.autoMerge.enabled is enabled.

Explore Star UPDATEs with Nested Columns

Star WHEN MATCHED UPDATEs with nested columns look very interesting.

Explore Schema Evolution and Types

Schema merging allows for implicit conversions so the type of the source may be different in the target (cf. the note at the end of resolveReferencesAndSchema).

Can this lead to any troubles? 🤔

Learn More

Until this note is here, learn more in resolveClause.

With Auto Schema Merging enabled, Delta Lake adds the new columns and nested fields to the target table (assigned to in merge actions of the WHEN MATCHED and the WHEN NOT MATCHED clauses that are not already part of the target schema).

NOT MATCHED BY SOURCE Excluded

NOT MATCHED BY SOURCEs are excluded since they can't reference source columns by definition and thus can't introduce new columns in the target schema.

Matched-Only Merges

A matched-only merge contains one or more WHEN MATCHED clauses only (with neither WHEN NOT MATCHED nor WHEN NOT MATCHED BY SOURCE clauses).

In other words, a matched-only merge is a merge with UPDATEs and DELETEs only, and no WHEN NOT MATCHED clauses.

With merge.optimizeMatchedOnlyMerge.enabled enabled, Delta Lake optimizes matched-only merges to use a RIGHT OUTER join (instead of a FULL OUTER join) while writing out all merge changes.

Change Data Feed

With Change Data Feed enabled on a delta table that is the target table of a merge command, MergeIntoCommand...FIXME

Merge and Joins

Merge command uses different joins when executed.

Repartition Before Write

writeFiles can repartition the output dataframe based on merge.repartitionBeforeWrite.enabled.

Materialized Source

The source of MERGE command can be materialized based on spark.databricks.delta.merge.materializeSource configuration property.

Demo

Demo: Merge Operation

Examples

Conditional Update with Delete

Tip

Use this notebook.

CREATE TABLE source
USING delta
AS VALUES
  (0, 0),
  (1, 10),
  (2, 20) AS data(key, value);
CREATE TABLE target
USING delta
AS VALUES
  (1, 1),
  (2, 2),
  (3, 3) AS data(key, value);
MERGE INTO target t
USING source s
ON s.key = t.key
WHEN MATCHED AND s.key <> 1 THEN UPDATE SET key = s.key, value = s.value
WHEN MATCHED THEN DELETE

Logging

Logging is configured using the logger of the MergeIntoCommand.

Learn More