DeleteCommand¶
DeleteCommand is a DeltaCommand that represents DeltaDelete logical command at execution.
DeleteCommand is a LeafRunnableCommand (Spark SQL) logical operator.
Creating Instance¶
DeleteCommand takes the following to be created:
- TahoeFileIndex
- Target Data (LogicalPlan)
- Condition (Expression)
DeleteCommand is created (also using apply factory utility) when:
- PreprocessTableDelete logical resolution rule is executed (and resolves a DeltaDelete logical command)
Performance Metrics¶
Signature
metrics: Map[String, SQLMetric]
metrics is part of the RunnableCommand (Spark SQL) abstraction.
metrics creates the performance metrics.
Executing Command¶
RunnableCommand
run(
sparkSession: SparkSession): Seq[Row]
run is part of the RunnableCommand (Spark SQL) abstraction.
run requests the TahoeFileIndex for the DeltaLog (and asserts that the table is removable).
run requests the DeltaLog to start a new transaction for performDelete.
In the end, run re-caches all cached plans (incl. this relation itself) by requesting the CacheManager (Spark SQL) to recache the target.
performDelete¶
performDelete(
sparkSession: SparkSession,
deltaLog: DeltaLog,
txn: OptimisticTransaction): Unit
performDelete is used when:
- DeleteCommand is executed
WriteIntoDeltais requested to removeFiles
Number of Table Files¶
performDelete requests the given DeltaLog for the current Snapshot that is in turn requested for the number of files in the delta table.
Finding Delete Actions¶
performDelete branches off based on the optional condition:
- No condition to delete the whole table
- Condition defined on metadata only
- Other conditions
Delete Condition Undefined¶
performDelete...FIXME
Metadata-Only Delete Condition¶
performDelete...FIXME
Other Delete Conditions¶
performDelete...FIXME
Delete Actions Available¶
performDelete...FIXME
rewriteFiles¶
rewriteFiles(
txn: OptimisticTransaction,
baseData: DataFrame,
filterCondition: Expression,
numFilesToRewrite: Long): Seq[FileAction]
rewriteFiles reads the delta.enableChangeDataFeed table property of the delta table (from the Metadata of the given OptimisticTransaction).
rewriteFiles creates a numTouchedRows metric and a numTouchedRowsUdf UDF to count the number of rows that have been touched.
rewriteFiles creates a DataFrame to write (with the numTouchedRowsUdf UDF and the filterCondition column). The DataFrame can also include _change_type column (with null or delete values based on the filterCondition).
In the end, rewriteFiles requests the given OptimisticTransaction to write the DataFrame.
shouldWritePersistentDeletionVectors¶
shouldWritePersistentDeletionVectors(
spark: SparkSession,
txn: OptimisticTransaction): Boolean
shouldWritePersistentDeletionVectors is enabled (true) when the following all hold:
- spark.databricks.delta.delete.deletionVectors.persistent configuration property is enabled (
true) - Protocol and table configuration support deletion vectors feature
Creating DeleteCommand¶
apply(
delete: DeltaDelete): DeleteCommand
apply creates a DeleteCommand.