Skip to content

MetadataCleanup

MetadataCleanup is an abstraction of metadata cleaners that can clean up expired checkpoints and delta logs of a delta table.

MetadataCleanup requires to be used with DeltaLog (or subtypes) only.

Implementations

Table Properties

enableExpiredLogCleanup

MetadataCleanup uses delta.enableExpiredLogCleanup table property to control log cleanup.

logRetentionDuration

MetadataCleanup uses delta.logRetentionDuration table property for cleanUpExpiredLogs (to determine fileCutOffTime).

Cleaning Up Expired Logs

Checkpoints
doLogCleanup(): Unit

doLogCleanup is part of the Checkpoints abstraction.

doLogCleanup cleanUpExpiredLogs when enabled.

cleanUpExpiredLogs

cleanUpExpiredLogs(): Unit

cleanUpExpiredLogs calculates a fileCutOffTime based on the current time and the logRetentionDuration table property.

cleanUpExpiredLogs prints out the following INFO message to the logs:

Starting the deletion of log files older than [date]

cleanUpExpiredLogs finds the expired delta logs (based on the fileCutOffTime) and deletes the files (using Hadoop's FileSystem.delete non-recursively). cleanUpExpiredLogs counts the files deleted (and uses it in the summary INFO message).

In the end, cleanUpExpiredLogs prints out the following INFO message to the logs:

Deleted [numDeleted] log files older than [date]

Finding Expired Log Files

listExpiredDeltaLogs(
  fileCutOffTime: Long): Iterator[FileStatus]

listExpiredDeltaLogs loads the most recent checkpoint if available.

If the last checkpoint is not available, listExpiredDeltaLogs returns an empty iterator.

listExpiredDeltaLogs requests the LogStore for the paths (in the same directory) that are (lexicographically) greater or equal to the 0th checkpoint file (per checkpointPrefix format) of the checkpoint and delta files in the log directory.

In the end, listExpiredDeltaLogs creates a BufferingLogDeletionIterator that...FIXME

Logging

Enable ALL logging level for the Implementations logger to see what happens inside.