DeltaHistoryManager¶
DeltaHistoryManager is used for version and commit history of a delta table.
Creating Instance¶
DeltaHistoryManager takes the following to be created:
DeltaHistoryManager is created when:
DeltaLogis requested for the DeltaHistoryManager
Maximum Number of Keys (per List API Call)¶
DeltaHistoryManager can be given the number of keys per a List API call when created.
Unless given, maxKeysPerList is 1000.
The value of maxKeysPerList can be configured using spark.databricks.delta.history.maxKeysPerList configuration property.
maxKeysPerList is used to look up the active commit at a given time (in parallelSearch).
Table History (Versions)¶
getHistory(
start: Long,
end: Option[Long] = None): Seq[CommitInfo]
getHistory(
limitOpt: Option[Int]): Seq[CommitInfo]
getHistory...FIXME
getHistory is used when:
DeltaTableOperationsis requested to executeHistory (for DeltaTable.history operator)- DescribeDeltaHistoryCommand is executed (for DESCRIBE HISTORY SQL command)
getCommitInfo¶
getCommitInfo(
logStore: LogStore,
basePath: Path,
version: Long): CommitInfo
getCommitInfo...FIXME
getActiveCommitAtTime¶
getActiveCommitAtTime(
timestamp: Timestamp,
canReturnLastCommit: Boolean,
mustBeRecreatable: Boolean = true,
canReturnEarliestCommit: Boolean = false): Commit
getActiveCommitAtTime determines the earliest commit to find based on the given mustBeRecreatable flag (default: true):
- When enabled (default),
getActiveCommitAtTimegetEarliestRecreatableCommit - When disabled,
getActiveCommitAtTimegetEarliestDeltaFile
getActiveCommitAtTime requests the DeltaLog to update that gives the latest Snapshot that is requested for the latest version.
getActiveCommitAtTime finds the commit. Based on how many commits to fetch (and the maxKeysPerList), getActiveCommitAtTime does parallelSearch or not.
getActiveCommitAtTime...FIXME
getActiveCommitAtTime is used when:
DeltaTableUtilsutility is used to resolveTimeTravelVersionDeltaSourceis requested for getStartingVersion
parallelSearch¶
parallelSearch(
time: Long,
start: Long,
end: Long): Commit
parallelSearch finds the latest Commit that happened at or before the given time in the range [start, end).
parallelSearch parallelSearch0.
parallelSearch0¶
parallelSearch0
parallelSearch0...FIXME