Skip to content

UpdateCommand

UpdateCommand is a DeltaCommand that represents DeltaUpdateTable logical command at execution.

UpdateCommand is a RunnableCommand (Spark SQL) logical operator.

UpdateCommand can use Deletion Vectors table feature to soft-delete records when executed (based on shouldWritePersistentDeletionVectors).

Creating Instance

UpdateCommand takes the following to be created:

UpdateCommand is created when:

Performance Metrics

Name web UI
numAddedFiles number of files added.
numRemovedFiles number of files removed.
numUpdatedRows number of rows updated.
executionTimeMs time taken to execute the entire operation
scanTimeMs time taken to scan the files for matches
rewriteTimeMs time taken to rewrite the matched files

Executing Command

RunnableCommand
run(
  sparkSession: SparkSession): Seq[Row]

run is part of the RunnableCommand (Spark SQL) abstraction.

run...FIXME

performUpdate

performUpdate(
  sparkSession: SparkSession,
  deltaLog: DeltaLog,
  txn: OptimisticTransaction): Unit

performUpdate...FIXME

With persistent Deletion Vectors enabled, performUpdate...FIXME and findTouchedFiles.

rewriteFiles

rewriteFiles(
  spark: SparkSession,
  txn: OptimisticTransaction,
  rootPath: Path,
  inputLeafFiles: Seq[String],
  nameToAddFileMap: Map[String, AddFile],
  condition: Expression): Seq[FileAction]

rewriteFiles...FIXME

buildUpdatedColumns

buildUpdatedColumns(
  condition: Expression): Seq[Column]

buildUpdatedColumns...FIXME

shouldWritePersistentDeletionVectors

shouldWritePersistentDeletionVectors(
  spark: SparkSession,
  txn: OptimisticTransaction): Boolean

shouldWritePersistentDeletionVectors is enabled (true) when the following all hold:

  1. spark.databricks.delta.update.deletionVectors.persistent configuration property is enabled (true)
  2. Protocol and table configuration support deletion vectors feature

shouldWritePersistentDeletionVectors is used when: