DeltaHistoryManager¶
DeltaHistoryManager
is used for version and commit history of a delta table.
Creating Instance¶
DeltaHistoryManager
takes the following to be created:
DeltaHistoryManager
is created when:
DeltaLog
is requested for the DeltaHistoryManager
Maximum Number of Keys (per List API Call)¶
DeltaHistoryManager
can be given the number of keys per a List
API call when created.
Unless given, maxKeysPerList
is 1000
.
The value of maxKeysPerList
can be configured using spark.databricks.delta.history.maxKeysPerList configuration property.
maxKeysPerList
is used to look up the active commit at a given time (in parallelSearch).
Table History (Versions)¶
getHistory(
start: Long,
end: Option[Long] = None): Seq[CommitInfo]
getHistory(
limitOpt: Option[Int]): Seq[CommitInfo]
getHistory
...FIXME
getHistory
is used when:
DeltaTableOperations
is requested to executeHistory (for DeltaTable.history operator)- DescribeDeltaHistoryCommand is executed (for DESCRIBE HISTORY SQL command)
getCommitInfo¶
getCommitInfo(
logStore: LogStore,
basePath: Path,
version: Long): CommitInfo
getCommitInfo
...FIXME
getActiveCommitAtTime¶
getActiveCommitAtTime(
timestamp: Timestamp,
canReturnLastCommit: Boolean,
mustBeRecreatable: Boolean = true,
canReturnEarliestCommit: Boolean = false): Commit
getActiveCommitAtTime
determines the earliest commit to find based on the given mustBeRecreatable
flag (default: true
):
- When enabled (default),
getActiveCommitAtTime
getEarliestRecreatableCommit - When disabled,
getActiveCommitAtTime
getEarliestDeltaFile
getActiveCommitAtTime
requests the DeltaLog to update that gives the latest Snapshot that is requested for the latest version.
getActiveCommitAtTime
finds the commit. Based on how many commits to fetch (and the maxKeysPerList), getActiveCommitAtTime
does parallelSearch or not.
getActiveCommitAtTime
...FIXME
getActiveCommitAtTime
is used when:
DeltaTableUtils
utility is used to resolveTimeTravelVersionDeltaSource
is requested for getStartingVersion
parallelSearch¶
parallelSearch(
time: Long,
start: Long,
end: Long): Commit
parallelSearch
finds the latest Commit
that happened at or before the given time
in the range [start, end)
.
parallelSearch
parallelSearch0.
parallelSearch0¶
parallelSearch0
parallelSearch0
...FIXME