Skip to content

Checkpoints

Checkpoints is an abstraction of DeltaLogs that can checkpoint the current state of a delta table.

Checkpoints requires to be used with DeltaLog (or subtypes) only.

Contract

dataPath

dataPath: Path

Hadoop Path to the data directory of the delta table

doLogCleanup

doLogCleanup(): Unit

Used when:

logPath

logPath: Path

Hadoop Path to the log directory of the delta table

Metadata

metadata: Metadata

Metadata of the delta table

snapshot

snapshot: Snapshot

Snapshot of the delta table

store

store: LogStore

LogStore

Implementations

_last_checkpoint Metadata File

Checkpoints uses _last_checkpoint metadata file (under the log path) for the following:

Checkpointing

checkpoint(): Unit
checkpoint(
  snapshotToCheckpoint: Snapshot): CheckpointMetaData

checkpoint writes a checkpoint of the current state of the delta table (Snapshot). That produces a checkpoint metadata with the version, the number of actions and possibly parts (for multi-part checkpoints).

checkpoint requests the LogStore to overwrite the _last_checkpoint file with the JSON-encoded checkpoint metadata.

In the end, checkpoint cleans up the expired logs (if enabled).

checkpoint is used when:

Writing Out State Checkpoint

writeCheckpoint(
  spark: SparkSession,
  deltaLog: DeltaLog,
  snapshot: Snapshot): CheckpointMetaData

writeCheckpoint...FIXME

Loading Latest Checkpoint Metadata

lastCheckpoint: Option[CheckpointMetaData]

lastCheckpoint loadMetadataFromFile (allowing for 3 retries).

lastCheckpoint is used when:

loadMetadataFromFile

loadMetadataFromFile(
  tries: Int): Option[CheckpointMetaData]

loadMetadataFromFile loads the _last_checkpoint file (in JSON format) and converts it to CheckpointMetaData (with a version, size and parts).

loadMetadataFromFile uses the LogStore to read the _last_checkpoint file.

In case the _last_checkpoint file is corrupted, loadMetadataFromFile...FIXME

Back to top