Skip to content


SnapshotManagement is an extension for DeltaLog to manage Snapshots.

Current Snapshot

SnapshotManagement manages currentSnapshot registry with the recently-loaded Snapshot (of a Delta table).

currentSnapshot is initialized as the latest available Snapshot right when DeltaLog is created and updated on demand.


currentSnapshot is used when:

  • SnapshotManagement is requested to...FIXME

Loading Latest Snapshot at Initialization

getSnapshotAtInit: Snapshot

getSnapshotAtInit getLogSegmentFrom for the last checkpoint.

getSnapshotAtInit prints out the following INFO message to the logs:

Loading version [version][startCheckpoint]

getSnapshotAtInit creates a Snapshot for the log segment.

getSnapshotAtInit records the current time in lastUpdateTimestamp registry.

getSnapshotAtInit prints out the following INFO message to the logs:

Returning initial snapshot [snapshot]

Fetching Log Files for Version Checkpointed

  startingCheckpoint: Option[CheckpointMetaData]): LogSegment

getLogSegmentFrom fetches log files for the version (based on the optional CheckpointMetaData as the starting checkpoint version to start listing log files from).

Fetching Latest Checkpoint and Delta Log Files for Version

  startCheckpoint: Option[Long],
  versionToLoad: Option[Long] = None): LogSegment

getLogSegmentForVersion list all the files (in a transaction log) from the given startCheckpoint (or defaults to 0).

getLogSegmentForVersion filters out unnecessary files and leaves checkpoint and delta files only.

getLogSegmentForVersion filters out checkpoint files of size 0.

getLogSegmentForVersion takes all the files that are older than the requested versionToLoad.

getLogSegmentForVersion splits the files into checkpoint and delta files.

getLogSegmentForVersion finds the latest checkpoint from the list.

In the end, getLogSegmentForVersion creates a LogSegment with the (checkpoint and delta) files.

getLogSegmentForVersion is used when:

Listing Files from Version Upwards

  startVersion: Long): Iterator[FileStatus]


Creating Snapshot

  segment: LogSegment,
  minFileRetentionTimestamp: Long,
  timestamp: Long): Snapshot

createSnapshot readChecksum (for the version of the given LogSegment) and creates a Snapshot.

createSnapshot is used when:

Last Successful Update Timestamp

SnapshotManagement uses lastUpdateTimestamp internal registry for the timestamp of the last successful update.

Updating Current Snapshot

  stalenessAcceptable: Boolean = false): Snapshot

update determines whether to do update asynchronously or not based on the input stalenessAcceptable flag and isSnapshotStale.

With stalenessAcceptable flag turned off (the default value) and the state snapshot is not stale, update updates (with isAsync flag turned off).



update is used when:


isSnapshotStale: Boolean

isSnapshotStale reads configuration property.

isSnapshotStale is enabled (true) when any of the following holds:

  1. configuration property is 0 (the default)
  2. Internal lastUpdateTimestamp has never been updated (and is below 0) or is at least configuration property old


  isAsync: Boolean = false): Snapshot



  isAsync: Boolean): Snapshot // (1)
  1. isAsync flag is not used

updateInternal requests the current Snapshot for the LogSegment that is in turn requested for the checkpointVersion. updateInternal gets the LogSegment for the checkpointVersion.

If the LogSegments are equal (and so no new files have been added), updateInternal updates the lastUpdateTimestamp registry to the current timestamp and returns the currentSnapshot.

Otherwise, if the fetched LogSegment is different than the current Snapshot's, updateInternal prints out the following INFO message to the logs:

Loading version [version][ starting from checkpoint version [v]]

updateInternal creates a new Snapshot with the fetched LogSegment.

updateInternal replaces Snapshots and prints out the following INFO message to the logs:

Updated snapshot to [newSnapshot]

Replacing Snapshots

  newSnapshot: Snapshot): Unit

replaceSnapshot requests the currentSnapshot to uncache (and drop any cached data) and makes the given newSnapshot the current one.


val log = DeltaLog.forTable(spark, dataPath)

assert(log.isInstanceOf[SnapshotManagement], "DeltaLog is a SnapshotManagement")
val snapshot = log.update(stalenessAcceptable = false)
scala> :type snapshot

assert(snapshot.version == 0)


As an extension of DeltaLog, use DeltaLog logging to see what happens inside.