SnapshotManagement¶
SnapshotManagement
is an extension for DeltaLog to manage Snapshots.
Demo¶
val name = "employees"
val dataPath = s"/tmp/delta/$name"
sql(s"DROP TABLE $name")
sql(s"""
| CREATE TABLE $name (id bigint, name string, city string)
| USING delta
| OPTIONS (path='$dataPath')
""".stripMargin)
import org.apache.spark.sql.delta.DeltaLog
val log = DeltaLog.forTable(spark, dataPath)
import org.apache.spark.sql.delta.SnapshotManagement
assert(log.isInstanceOf[SnapshotManagement], "DeltaLog is a SnapshotManagement")
val snapshot = log.update(stalenessAcceptable = false)
scala> :type snapshot
org.apache.spark.sql.delta.Snapshot
assert(snapshot.version == 0)
Current Snapshot¶
currentSnapshot: Snapshot
currentSnapshot
is a registry with the current Snapshot of a Delta table.
When DeltaLog is created, currentSnapshot
is initialized as getSnapshotAtInit and changed every update.
currentSnapshot
...FIXME
currentSnapshot
is used when:
DeltaLog
is requested to isValidSnapshotManagement
is requested to...FIXME
Updating Current Snapshot¶
update(
stalenessAcceptable: Boolean = false): Snapshot
update
...FIXME
update
is used when:
DeltaLog
is requested to start a transaction and withNewTransactionOptimisticTransactionImpl
is requested to doCommit and getNextAttemptVersionDeltaTableV2
is requested for a SnapshotTahoeLogFileIndex
is requested for a SnapshotDeltaHistoryManager
is requested to getHistory, getActiveCommitAtTime, checkVersionExists- In Delta commands...
tryUpdate¶
tryUpdate(
isAsync: Boolean = false): Snapshot
tryUpdate
...FIXME
tryUpdate
is used when SnapshotManagement
is requested to update.
updateInternal¶
updateInternal(
isAsync: Boolean): Snapshot
updateInternal
...FIXME
updateInternal
is used when SnapshotManagement
is requested to update (and tryUpdate).
Loading Latest Snapshot¶
getSnapshotAtInit: Snapshot
getSnapshotAtInit
getLogSegmentFrom for the last checkpoint.
getSnapshotAtInit
prints out the following INFO message to the logs:
Loading version [version][startCheckpoint]
getSnapshotAtInit
creates a Snapshot for the log segment.
getSnapshotAtInit
records the current time in lastUpdateTimestamp registry.
getSnapshotAtInit
prints out the following INFO message to the logs:
Returning initial snapshot [snapshot]
getSnapshotAtInit
is used when SnapshotManagement
is created (and initializes the currentSnapshot registry).
getLogSegmentFrom¶
getLogSegmentFrom(
startingCheckpoint: Option[CheckpointMetaData]): LogSegment
getLogSegmentFrom
getLogSegmentForVersion for the version of the given CheckpointMetaData
(if specified) as a start checkpoint version or leaves it undefined.
getLogSegmentFrom
is used when SnapshotManagement
is requested for getSnapshotAtInit.
getLogSegmentForVersion¶
getLogSegmentForVersion(
startCheckpoint: Option[Long],
versionToLoad: Option[Long] = None): LogSegment
getLogSegmentForVersion
...FIXME
getLogSegmentForVersion
is used when SnapshotManagement
is requested for getLogSegmentFrom, updateInternal and getSnapshotAt.
Creating Snapshot¶
createSnapshot(
segment: LogSegment,
minFileRetentionTimestamp: Long,
timestamp: Long): Snapshot
createSnapshot
readChecksum (for the version of the given LogSegment
) and creates a Snapshot.
createSnapshot
is used when SnapshotManagement
is requested for getSnapshotAtInit, getSnapshotAt and update.
Last Successful Update Timestamp¶
SnapshotManagement
uses lastUpdateTimestamp
internal registry for the timestamp of the last successful update.