VacuumCommand Utility — Garbage Collecting Delta Table

VacuumCommand is a concrete VacuumCommandImpl for gc.

Garbage Collecting Of Delta Table — gc Utility

  spark: SparkSession,
  deltaLog: DeltaLog,
  dryRun: Boolean = true,
  retentionHours: Option[Double] = None,
  clock: Clock = new SystemClock): DataFrame

gc requests the given DeltaLog to update (and give the latest Snapshot of the delta table).

gc…​FIXME (deleteBeforeTimestamp)

gc prints out the following INFO message to the logs:

Starting garbage collection (dryRun = [dryRun]) of untracked files older than [deleteBeforeTimestamp] in [path]

gc requests the Snapshot for the state dataset and defines a function for every action (in a partition) that does the following:

  1. FIXME

gc converts the mapped state dataset (of actions) into a DataFrame with a single path column.


gc caches the allFilesAndDirs dataset.

gc prints out the following INFO message to the logs:

Deleting untracked files and empty directories in [path]


gc prints out the following message to standard output:

Deleted [filesDeleted] files and directories in a total of [dirCounts] directories.


In the end, gc unpersists the allFilesAndDirs dataset.

gc is used when:

checkRetentionPeriodSafety Method

  spark: SparkSession,
  retentionMs: Option[Long],
  configuredRetention: Long): Unit


checkRetentionPeriodSafety is used exclusively when VacuumCommand utility is requested to gc.


Enable ALL logging level for logger to see what happens inside.

Add the following line to conf/

Refer to Logging.