Skip to content

Change Data Feed

Change Data Feed (CDF) (aka Change Data Capture or CDC in short) is a feature of Delta Lake that allows tracking row-level changes between versions of a delta table.

With so-called CDC-Aware Table Scan (CDC Read), loading a delta table gives data changes (not the data of a particular version of the delta table).

As they put it (in this comment), CDCReader is the key class used for Change Data Feed (with DelayedCommitProtocol to handle it properly).

Non-CDC data is written out to the base directory of a delta table, while CDC data is written out to the _change_data special folder.

Change Data Feed is a new feature in Delta Lake 2.0.0 (that was tracked under Support for Change Data Feed in Delta Lake #1105).

Enabling CDF for a Delta table

Enable CDF for a table using delta.enableChangeDataFeed table property.

ALTER TABLE delta_demo
SET TBLPROPERTIES (delta.enableChangeDataFeed = true)
CREATE TABLE delta_demo (id INT, name STRING, age INT)
USING delta
TBLPROPERTIES (delta.enableChangeDataFeed = true)

Additionally, this property can be set for all new tables by default.

SET spark.databricks.delta.properties.defaults.enableChangeDataFeed = true;

Options

Change Data Feed is enabled in batch and streaming queries using readChangeFeed option.

spark
  .read
  .format("delta")
  .option("readChangeFeed", "true")
  .option("startingVersion", startingVersion)
  .option("endingVersion", endingVersion)
  .table("source")
spark
  .readStream
  .format("delta")
  .option("readChangeFeed", "true")
  .option("startingVersion", startingVersion)
  .table("source")

readChangeFeed is used alongside the other CDC options:

_change_type Column

_change_type column represents a change type.

_change_type Command
delete DeleteCommand
FIXME

Protocol

Change Data Feed requires the minimum protocol version to be 0 for readers and 4 for writers.

Column Mapping Not Supported

Change data feed reads are currently not supported on tables with column mapping enabled (and a DeltaUnsupportedOperationException is thrown).

Demo

Change Data Feed

Learn More

  1. Delta Lake guide