VacuumCommand Utility — Garbage Collecting Delta Table¶
VacuumCommand is a concrete VacuumCommandImpl for gc.
Garbage Collecting Of Delta Table¶
gc(
spark: SparkSession,
deltaLog: DeltaLog,
dryRun: Boolean = true,
retentionHours: Option[Double] = None,
clock: Clock = new SystemClock): DataFrame
gc requests the given DeltaLog
to <
[[gc-deleteBeforeTimestamp]] gc...FIXME (deleteBeforeTimestamp)
gc prints out the following INFO message to the logs:
Starting garbage collection (dryRun = [dryRun]) of untracked files older than [deleteBeforeTimestamp] in [path]
[[gc-validFiles]] gc requests the Snapshot
for the <
. FIXME
gc converts the mapped state dataset (of actions) into a DataFrame
with a single path
column.
[[gc-allFilesAndDirs]] gc...FIXME
gc caches the <
gc prints out the following INFO message to the logs:
Deleting untracked files and empty directories in [path]
gc...FIXME
gc prints out the following message to standard output:
Deleted [filesDeleted] files and directories in a total of [dirCounts] directories.
gc...FIXME
In the end, gc unpersists the <
[NOTE]¶
gc is used when:
DeltaTableOperations
is requested to <> (for < > operator)
* <> is executed (for delta-sql.md#VACUUM[VACUUM] SQL command)¶
== [[checkRetentionPeriodSafety]] checkRetentionPeriodSafety
Method
[source, scala]¶
checkRetentionPeriodSafety( spark: SparkSession, retentionMs: Option[Long], configuredRetention: Long): Unit
checkRetentionPeriodSafety
...FIXME
NOTE: checkRetentionPeriodSafety
is used exclusively when VacuumCommand utility is requested to <
== [[logging]] Logging
Enable ALL
logging level for org.apache.spark.sql.delta.commands.VacuumCommand
logger to see what happens inside.
Add the following line to conf/log4j.properties
:
[source,plaintext]¶
log4j.logger.org.apache.spark.sql.delta.commands.VacuumCommand=ALL¶
Refer to Logging.