Skip to content

MetadataCleanup

MetadataCleanup is an abstraction of <> that can <> the <>.

[[implementations]][[self]] NOTE: <> is the default and only known MetadataCleanup in Delta Lake.

[[logging]] [TIP] ==== Enable ALL logging level for org.apache.spark.sql.delta.MetadataCleanup logger to see what happens inside.

Add the following line to conf/log4j.properties:

log4j.logger.org.apache.spark.sql.delta.MetadataCleanup=ALL

Refer to Logging..

== [[doLogCleanup]] doLogCleanup Method

[source, scala]

doLogCleanup(): Unit

[NOTE]

doLogCleanup is part of the <> to...FIXME.

Interestingly, this MetadataCleanup and <> abstractions require to be used with <> only.

doLogCleanup <> when the <> table property is enabled.

== [[enableExpiredLogCleanup]] enableExpiredLogCleanup Table Property -- enableExpiredLogCleanup Method

[source, scala]

enableExpiredLogCleanup: Boolean

enableExpiredLogCleanup gives the value of <> table property (<> the <>).

NOTE: enableExpiredLogCleanup is used exclusively when MetadataCleanup is requested to <>.

== [[deltaRetentionMillis]] logRetentionDuration Table Property -- deltaRetentionMillis Method

[source, scala]

deltaRetentionMillis: Long

deltaRetentionMillis gives the value of <> table property (<> the <>).

NOTE: deltaRetentionMillis is used when...FIXME

== [[cleanUpExpiredLogs]] cleanUpExpiredLogs Internal Method

[source, scala]

cleanUpExpiredLogs(): Unit

cleanUpExpiredLogs calculates a so-called fileCutOffTime based on the <> and the <> table property.

cleanUpExpiredLogs prints out the following INFO message to the logs:

Starting the deletion of log files older than [date]

cleanUpExpiredLogs <> (based on the fileCutOffTime) and deletes the files (using Hadoop's FileSystem.delete non-recursively).

In the end, cleanUpExpiredLogs prints out the following INFO message to the logs:

Deleted numDeleted log files older than [date]

NOTE: cleanUpExpiredLogs is used exclusively when MetadataCleanup is requested to <>.

== [[listExpiredDeltaLogs]] Finding Expired Delta Logs -- listExpiredDeltaLogs Internal Method

[source, scala]

listExpiredDeltaLogs( fileCutOffTime: Long): Iterator[FileStatus]


listExpiredDeltaLogs...FIXME

requests the <> for the <> that are (lexicographically) greater or equal to the 0th checkpoint file (per <> format) of the <> and <> files in the <> (of the <>).

In the end, listExpiredDeltaLogs creates a BufferingLogDeletionIterator that...FIXME

NOTE: listExpiredDeltaLogs is used exclusively when MetadataCleanup is requested to <>.


Last update: 2020-10-05