Skip to content


MetadataCleanup is an abstraction of metadata cleaners that can clean up expired checkpoints and delta logs of a delta table.

MetadataCleanup requires to be used with DeltaLog (or subtypes) only.


Table Properties


MetadataCleanup uses delta.enableExpiredLogCleanup table property to control log cleanup.


MetadataCleanup uses delta.logRetentionDuration table property for cleanUpExpiredLogs (to determine fileCutOffTime).

Cleaning Up Expired Logs

doLogCleanup(): Unit

doLogCleanup is part of the Checkpoints abstraction.

doLogCleanup cleanUpExpiredLogs when enabled.


cleanUpExpiredLogs(): Unit

cleanUpExpiredLogs calculates a fileCutOffTime based on the current time and the logRetentionDuration table property.

cleanUpExpiredLogs prints out the following INFO message to the logs:

Starting the deletion of log files older than [date]

cleanUpExpiredLogs finds the expired delta logs (based on the fileCutOffTime) and deletes the files (using Hadoop's FileSystem.delete non-recursively). cleanUpExpiredLogs counts the files deleted (and uses it in the summary INFO message).

In the end, cleanUpExpiredLogs prints out the following INFO message to the logs:

Deleted [numDeleted] log files older than [date]

Finding Expired Log Files

  fileCutOffTime: Long): Iterator[FileStatus]

listExpiredDeltaLogs loads the most recent checkpoint if available.

If the last checkpoint is not available, listExpiredDeltaLogs returns an empty iterator.

listExpiredDeltaLogs requests the LogStore for the paths (in the same directory) that are (lexicographically) greater or equal to the 0th checkpoint file (per checkpointPrefix format) of the checkpoint and delta files in the log directory.

In the end, listExpiredDeltaLogs creates a BufferingLogDeletionIterator that...FIXME


Enable ALL logging level for the Implementations logger to see what happens inside.