Deletion Vectors¶
Deletion Vectors table feature is used to speed up a conditional DELETE command (when executed with a delete condition) in such a way that deleted rows as marked as such with no physical data file rewrite.
It is said that Deletion Vectors feature soft-deletes data.
Deletion Vectors can be enabled on a delta table using delta.enableDeletionVectors table property.
Deletion Vectors is used on a delta table when all of the following hold:
- spark.databricks.delta.delete.deletionVectors.persistent system-wide configuration property is enabled
- delta.enableDeletionVectors table property is enabled
- DeletionVectorsTableFeature is supported by the Protocol
REORG TABLE Command¶
REORG TABLE is used to purge soft-deleted data.
Persistent Deletion Vectors¶
spark.databricks.delta.delete.deletionVectors.persistent
Demo¶
Create a delta table with delta.enableDeletionVectors table property enabled.
CREATE TABLE tbl(a int)
USING delta
TBLPROPERTIES (
'delta.enableDeletionVectors' = 'true'
)
Describe the detail of the delta table using DESCRIBE DETAIL command.
sql("desc detail tbl")
.select("name", "properties", "minReaderVersion", "minWriterVersion", "tableFeatures")
.show(truncate = false)
+-------------------------+-------------------------------------+----------------+----------------+-----------------+
|name |properties |minReaderVersion|minWriterVersion|tableFeatures |
+-------------------------+-------------------------------------+----------------+----------------+-----------------+
|spark_catalog.default.tbl|{delta.enableDeletionVectors -> true}|3 |7 |[deletionVectors]|
+-------------------------+-------------------------------------+----------------+----------------+-----------------+