MetadataCleanup¶
MetadataCleanup
is an abstraction of metadata cleaners that can clean up expired checkpoints and delta logs of a delta table.
MetadataCleanup
requires to be used with DeltaLog (or subtypes) only.
Implementations¶
Table Properties¶
enableExpiredLogCleanup¶
MetadataCleanup
uses delta.enableExpiredLogCleanup table property to control log cleanup.
logRetentionDuration¶
MetadataCleanup
uses delta.logRetentionDuration table property for cleanUpExpiredLogs (to determine fileCutOffTime
).
Cleaning Up Expired Logs¶
doLogCleanup
cleanUpExpiredLogs when enabled.
cleanUpExpiredLogs¶
cleanUpExpiredLogs(): Unit
cleanUpExpiredLogs
calculates a fileCutOffTime
based on the current time and the logRetentionDuration table property.
cleanUpExpiredLogs
prints out the following INFO message to the logs:
Starting the deletion of log files older than [date]
cleanUpExpiredLogs
finds the expired delta logs (based on the fileCutOffTime
) and deletes the files (using Hadoop's FileSystem.delete non-recursively). cleanUpExpiredLogs
counts the files deleted (and uses it in the summary INFO message).
In the end, cleanUpExpiredLogs
prints out the following INFO message to the logs:
Deleted [numDeleted] log files older than [date]
Finding Expired Log Files¶
listExpiredDeltaLogs(
fileCutOffTime: Long): Iterator[FileStatus]
listExpiredDeltaLogs
loads the most recent checkpoint if available.
If the last checkpoint is not available, listExpiredDeltaLogs
returns an empty iterator.
listExpiredDeltaLogs
requests the LogStore for the paths (in the same directory) that are (lexicographically) greater or equal to the 0
th checkpoint file (per checkpointPrefix format) of the checkpoint and delta files in the log directory.
In the end, listExpiredDeltaLogs
creates a BufferingLogDeletionIterator
that...FIXME
Logging¶
Enable ALL
logging level for the Implementations logger to see what happens inside.