DeltaSourceSnapshot is a SnapshotIterator with StateCache

DeltaSourceSnapshot is created when DeltaSource is requested for the snapshot at a given version.

When created, DeltaSourceSnapshot requests the Snapshot for the version that it uses for the initialFiles (a new column and the name of the cached RDD).

Creating DeltaSourceSnapshot Instance

DeltaSourceSnapshot takes the following to be created:

  • SparkSession

  • Snapshot

  • Filter expressions (Seq[Expression])

Initial Files (Indexed AddFiles) — initialFiles Method

initialFiles: Dataset[IndexedFile]

initialFiles requests the Snapshot for all files (Dataset[AddFile]) and sorts them by modificationTime and path in ascending order.

initialFiles zips the AddFiles with indices (using RDD.zipWithIndex operator), adds two new columns with the version and isLast as false, and finally creates a Dataset[IndexedFile].

In the end, initialFiles caches the dataset with the following name (with the version and the redactedPath of the Snapshot)

Delta Source Snapshot #[version] - [redactedPath]
initialFiles is used exclusively when SnapshotIterator is requested for a iterator (of IndexedFiles).