Skip to content

DeltaSourceSnapshot

DeltaSourceSnapshot is a SnapshotIterator and a StateCache for DeltaSource.

Creating Instance

DeltaSourceSnapshot takes the following to be created:

DeltaSourceSnapshot is created when:

initialFiles Dataset (of IndexedFiles)

initialFiles: Dataset[IndexedFile]

Dataset of Indexed AddFiles

initialFiles requests the Snapshot for all AddFiles (in the snapshot) (Dataset[AddFile]).

initialFiles sorts the AddFile dataset (Dataset[AddFile]) by modificationTime and path in ascending order.

initialFiles indexes the AddFiles (using RDD.zipWithIndex operator) that gives a RDD[(AddFile, Long)].

initialFiles converts the RDD to a DataFrame of two columns: add and index.

initialFiles adds the two new columns:

initialFiles converts (projects) DataFrame to Dataset[IndexedFile].

Creating CachedDS

initialFiles caches the Dataset[IndexedFile] under the following name (with the version and the redactedPath of this Snapshot):

Delta Source Snapshot #[version] - [redactedPath]

Cached Dataset of Indexed AddFiles

In the end, initialFiles requests the CachedDS to getDS.

Usage

initialFiles is used when:

  • SnapshotIterator is requested for the AddFiles

Closing

close(
  unpersistSnapshot: Boolean): Unit

close is part of the SnapshotIterator abstraction.

close requests the Snapshot to uncache when the given unpersistSnapshot flag is enabled.