InMemoryLogReplay¶
InMemoryLogReplay
is used at the very last phase of state reconstruction (of a cached delta state).
InMemoryLogReplay
handles a single partition of the state reconstruction dataset (based on the spark.databricks.delta.snapshotPartitions configuration property).
Creating Instance¶
InMemoryLogReplay
takes the following to be created:
-
minFileRetentionTimestamp
(Snapshot.minFileRetentionTimestamp)
InMemoryLogReplay
is created when:
Snapshot
is requested for state reconstruction
Lifecycle¶
The lifecycle of InMemoryLogReplay
is as follows:
-
Append all SingleActions of a partition (based on the spark.databricks.delta.snapshotPartitions configuration property)
Replaying Version¶
append(
version: Long,
actions: Iterator[Action]): Unit
append
sets the currentVersion to the given version
.
append
adds the given actions to their respective registries.
Action | Registry |
---|---|
SetTransaction | transactions by appId |
Metadata | currentMetaData |
Protocol | currentProtocolVersion |
AddFile | 1. activeFiles by path and with dataChange flag disabled |
2. Removes the path from tombstones (so there's only one FileAction for a path) | |
RemoveFile | 1. Removes the path from activeFiles (so there's only one FileAction for a path) |
2. tombstones by path and with dataChange flag disabled | |
CommitInfo | Ignored |
AddCDCFile | Ignored |
append
throws an AssertionError
when the currentVersion is -1
or one before the given version
:
Attempted to replay version [version], but state is at [currentVersion]
Current State of Delta Table¶
checkpoint: Iterator[Action]
checkpoint
returns an Iterator
(Scala) of Actions in the following order:
- currentProtocolVersion if defined (non-
null
) - currentMetaData if defined (non-
null
) - SetTransactions
- AddFiles and RemoveFiles sorted by path (lexicographically)
getTombstones¶
getTombstones: Iterable[FileAction]
getTombstones
uses the tombstones internal registry for RemoveFiles with deletionTimestamp after (greater than) the minFileRetentionTimestamp.