Skip to content

VacuumCommand Utility — Garbage Collecting Delta Table

VacuumCommand is a concrete VacuumCommandImpl for gc.

Garbage Collecting Of Delta Table

  spark: SparkSession,
  deltaLog: DeltaLog,
  dryRun: Boolean = true,
  retentionHours: Option[Double] = None,
  clock: Clock = new SystemClock): DataFrame

gc requests the given DeltaLog to <> (and give the latest <> of the delta table).

[[gc-deleteBeforeTimestamp]] gc...FIXME (deleteBeforeTimestamp)

gc prints out the following INFO message to the logs:

Starting garbage collection (dryRun = [dryRun]) of untracked files older than [deleteBeforeTimestamp] in [path]

[[gc-validFiles]] gc requests the Snapshot for the <> and defines a function for every action (in a partition) that does the following:


gc converts the mapped state dataset (of actions) into a DataFrame with a single path column.

[[gc-allFilesAndDirs]] gc...FIXME

gc caches the <> dataset.

gc prints out the following INFO message to the logs:

Deleting untracked files and empty directories in [path]


gc prints out the following message to standard output:

Deleted [filesDeleted] files and directories in a total of [dirCounts] directories.


In the end, gc unpersists the <> dataset.


gc is used when:

  • DeltaTableOperations is requested to <> (for <> operator)

* <> is executed (for[VACUUM] SQL command)

== [[checkRetentionPeriodSafety]] checkRetentionPeriodSafety Method

[source, scala]

checkRetentionPeriodSafety( spark: SparkSession, retentionMs: Option[Long], configuredRetention: Long): Unit


NOTE: checkRetentionPeriodSafety is used exclusively when VacuumCommand utility is requested to <>.

== [[logging]] Logging

Enable ALL logging level for logger to see what happens inside.

Add the following line to conf/


Refer to Logging.

Last update: 2020-09-27