InMemoryLogReplay¶
InMemoryLogReplay is used at the very last phase of state reconstruction (of a cached delta state).
InMemoryLogReplay handles a single partition of the state reconstruction dataset (based on the spark.databricks.delta.snapshotPartitions configuration property).
Creating Instance¶
InMemoryLogReplay takes the following to be created:
-
minFileRetentionTimestamp(Snapshot.minFileRetentionTimestamp)
InMemoryLogReplay is created when:
Snapshotis requested for state reconstruction
Lifecycle¶
The lifecycle of InMemoryLogReplay is as follows:
-
Append all SingleActions of a partition (based on the spark.databricks.delta.snapshotPartitions configuration property)
Replaying Version¶
append(
version: Long,
actions: Iterator[Action]): Unit
append sets the currentVersion to the given version.
append adds the given actions to their respective registries.
| Action | Registry |
|---|---|
| SetTransaction | transactions by appId |
| Metadata | currentMetaData |
| Protocol | currentProtocolVersion |
| AddFile | 1. activeFiles by path and with dataChange flag disabled |
| 2. Removes the path from tombstones (so there's only one FileAction for a path) | |
| RemoveFile | 1. Removes the path from activeFiles (so there's only one FileAction for a path) |
| 2. tombstones by path and with dataChange flag disabled | |
| CommitInfo | Ignored |
| AddCDCFile | Ignored |
append throws an AssertionError when the currentVersion is -1 or one before the given version:
Attempted to replay version [version], but state is at [currentVersion]
Current State of Delta Table¶
checkpoint: Iterator[Action]
checkpoint returns an Iterator (Scala) of Actions in the following order:
- currentProtocolVersion if defined (non-
null) - currentMetaData if defined (non-
null) - SetTransactions
- AddFiles and RemoveFiles sorted by path (lexicographically)
getTombstones¶
getTombstones: Iterable[FileAction]
getTombstones uses the tombstones internal registry for RemoveFiles with deletionTimestamp after (greater than) the minFileRetentionTimestamp.