Skip to content

DeleteCommand

DeleteCommand is a DeltaCommand that represents DeltaDelete logical command at execution.

DeleteCommand is a LeafRunnableCommand (Spark SQL) logical operator.

Creating Instance

DeleteCommand takes the following to be created:

DeleteCommand is created (also using apply factory utility) when:

Performance Metrics

Signature
metrics: Map[String, SQLMetric]

metrics is part of the RunnableCommand (Spark SQL) abstraction.

metrics creates the performance metrics.

Executing Command

RunnableCommand
run(
  sparkSession: SparkSession): Seq[Row]

run is part of the RunnableCommand (Spark SQL) abstraction.

run requests the TahoeFileIndex for the DeltaLog (and asserts that the table is removable).

run requests the DeltaLog to start a new transaction for performDelete.

In the end, run re-caches all cached plans (incl. this relation itself) by requesting the CacheManager (Spark SQL) to recache the target.

performDelete

performDelete(
  sparkSession: SparkSession,
  deltaLog: DeltaLog,
  txn: OptimisticTransaction): Unit

performDelete is used when:

Number of Table Files

performDelete requests the given DeltaLog for the current Snapshot that is in turn requested for the number of files in the delta table.

Finding Delete Actions

performDelete branches off based on the optional condition:

  1. No condition to delete the whole table
  2. Condition defined on metadata only
  3. Other conditions

Delete Condition Undefined

performDelete...FIXME

Metadata-Only Delete Condition

performDelete...FIXME

Other Delete Conditions

performDelete...FIXME

Delete Actions Available

performDelete...FIXME

rewriteFiles

rewriteFiles(
  txn: OptimisticTransaction,
  baseData: DataFrame,
  filterCondition: Expression,
  numFilesToRewrite: Long): Seq[FileAction]

rewriteFiles reads the delta.enableChangeDataFeed table property of the delta table (from the Metadata of the given OptimisticTransaction).

rewriteFiles creates a numTouchedRows metric and a numTouchedRowsUdf UDF to count the number of rows that have been touched.

rewriteFiles creates a DataFrame to write (with the numTouchedRowsUdf UDF and the filterCondition column). The DataFrame can also include _change_type column (with null or delete values based on the filterCondition).

In the end, rewriteFiles requests the given OptimisticTransaction to write the DataFrame.

shouldWritePersistentDeletionVectors

shouldWritePersistentDeletionVectors(
  spark: SparkSession,
  txn: OptimisticTransaction): Boolean

shouldWritePersistentDeletionVectors is enabled (true) when the following all hold:

  1. spark.databricks.delta.delete.deletionVectors.persistent configuration property is enabled (true)
  2. Protocol and table configuration support deletion vectors feature

Creating DeleteCommand

apply(
  delete: DeltaDelete): DeleteCommand

apply creates a DeleteCommand.