Skip to content

DeltaHistoryManager

DeltaHistoryManager is used for version and commit history of a delta table.

Creating Instance

DeltaHistoryManager takes the following to be created:

DeltaHistoryManager is created when:

Maximum Number of Keys (per List API Call)

DeltaHistoryManager can be given the number of keys per a List API call when created.

Unless given, maxKeysPerList is 1000.

The value of maxKeysPerList can be configured using spark.databricks.delta.history.maxKeysPerList configuration property.

maxKeysPerList is used to look up the active commit at a given time (in parallelSearch).

Table History (Versions)

getHistory(
  start: Long,
  end: Option[Long] = None): Seq[CommitInfo]
getHistory(
  limitOpt: Option[Int]): Seq[CommitInfo]

getHistory...FIXME


getHistory is used when:

getCommitInfo

getCommitInfo(
  logStore: LogStore,
  basePath: Path,
  version: Long): CommitInfo

getCommitInfo...FIXME

getActiveCommitAtTime

getActiveCommitAtTime(
  timestamp: Timestamp,
  canReturnLastCommit: Boolean,
  mustBeRecreatable: Boolean = true,
  canReturnEarliestCommit: Boolean = false): Commit

getActiveCommitAtTime determines the earliest commit to find based on the given mustBeRecreatable flag (default: true):

getActiveCommitAtTime requests the DeltaLog to update that gives the latest Snapshot that is requested for the latest version.

getActiveCommitAtTime finds the commit. Based on how many commits to fetch (and the maxKeysPerList), getActiveCommitAtTime does parallelSearch or not.

getActiveCommitAtTime...FIXME


getActiveCommitAtTime is used when:

parallelSearch

parallelSearch(
  time: Long,
  start: Long,
  end: Long): Commit

parallelSearch finds the latest Commit that happened at or before the given time in the range [start, end).

parallelSearch parallelSearch0.

parallelSearch0

parallelSearch0

parallelSearch0...FIXME