InMemoryLogReplay is used at the very last phase of state reconstruction (of a cached delta state).
InMemoryLogReplay handles a single partition of the state reconstruction dataset (based on the spark.databricks.delta.snapshotPartitions configuration property).
InMemoryLogReplay takes the following to be created:
InMemoryLogReplay is created when:
Snapshotis requested for state reconstruction
The lifecycle of
InMemoryLogReplay is as follows:
Append all SingleActions of a partition (based on the spark.databricks.delta.snapshotPartitions configuration property)
append( version: Long, actions: Iterator[Action]): Unit
append sets the currentVersion to the given
append adds the given actions to their respective registries.
|SetTransaction||transactions by appId|
|AddFile||1. activeFiles by path and with dataChange flag disabled|
|2. Removes the path from tombstones (so there's only one FileAction for a path)|
|RemoveFile||1. Removes the path from activeFiles (so there's only one FileAction for a path)|
|2. tombstones by path and with dataChange flag disabled|
append throws an
AssertionError when the currentVersion is
-1 or one before the given
Attempted to replay version [version], but state is at [currentVersion]
Current State of Delta Table¶
checkpoint returns an
Iterator (Scala) of Actions in the following order:
- currentProtocolVersion if defined (non-
- currentMetaData if defined (non-
- AddFiles and RemoveFiles sorted by path (lexicographically)
getTombstones uses the tombstones internal registry for RemoveFiles with deletionTimestamp after (greater than) the minFileRetentionTimestamp.