Skip to content

Deletion Vectors

Deletion Vectors table feature is used to speed up a conditional DELETE command (when executed with a delete condition) in such a way that deleted rows as marked as such with no physical data file rewrite.

It is said that Deletion Vectors feature soft-deletes data.

Deletion Vectors can be enabled on a delta table using delta.enableDeletionVectors table property.

Deletion Vectors is used on a delta table when all of the following hold:

  1. spark.databricks.delta.delete.deletionVectors.persistent system-wide configuration property is enabled
  2. delta.enableDeletionVectors table property is enabled
  3. DeletionVectorsTableFeature is supported by the Protocol

REORG TABLE Command

REORG TABLE is used to purge soft-deleted data.

Persistent Deletion Vectors

spark.databricks.delta.delete.deletionVectors.persistent

Demo

Create a delta table with delta.enableDeletionVectors table property enabled.

CREATE TABLE tbl(a int)
USING delta
TBLPROPERTIES (
  'delta.enableDeletionVectors' = 'true'
)

Describe the detail of the delta table using DESCRIBE DETAIL command.

sql("desc detail tbl")
  .select("name", "properties", "minReaderVersion", "minWriterVersion", "tableFeatures")
  .show(truncate = false)
+-------------------------+-------------------------------------+----------------+----------------+-----------------+
|name                     |properties                           |minReaderVersion|minWriterVersion|tableFeatures    |
+-------------------------+-------------------------------------+----------------+----------------+-----------------+
|spark_catalog.default.tbl|{delta.enableDeletionVectors -> true}|3               |7               |[deletionVectors]|
+-------------------------+-------------------------------------+----------------+----------------+-----------------+