Skip to content

InMemoryLogReplay

InMemoryLogReplay is used at the very last phase of state reconstruction (of a cached delta state).

InMemoryLogReplay handles a single partition of the state reconstruction dataset (based on the spark.databricks.delta.snapshotPartitions configuration property).

Creating Instance

InMemoryLogReplay takes the following to be created:

InMemoryLogReplay is created when:

Lifecycle

The lifecycle of InMemoryLogReplay is as follows:

  1. Created (with Snapshot.minFileRetentionTimestamp)

  2. Append all SingleActions of a partition (based on the spark.databricks.delta.snapshotPartitions configuration property)

  3. Checkpoint

Replaying Version

append(
  version: Long,
  actions: Iterator[Action]): Unit

append sets the currentVersion to the given version.

append adds the given actions to their respective registries.

Action Registry
SetTransaction transactions by appId
Metadata currentMetaData
Protocol currentProtocolVersion
AddFile 1. activeFiles by path and with dataChange flag disabled
  2. Removes the path from tombstones (so there's only one FileAction for a path)
RemoveFile 1. Removes the path from activeFiles (so there's only one FileAction for a path)
  2. tombstones by path and with dataChange flag disabled
CommitInfo Ignored
AddCDCFile Ignored

append throws an AssertionError when the currentVersion is -1 or one before the given version:

Attempted to replay version [version], but state is at [currentVersion]

Current State of Delta Table

checkpoint: Iterator[Action]

checkpoint returns an Iterator (Scala) of Actions in the following order:

getTombstones

getTombstones: Iterable[FileAction]

getTombstones uses the tombstones internal registry for RemoveFiles with deletionTimestamp after (greater than) the minFileRetentionTimestamp.