DeltaSourceSnapshot¶
DeltaSourceSnapshot is a SnapshotIterator and a StateCache for DeltaSource.
Creating Instance¶
DeltaSourceSnapshot takes the following to be created:
DeltaSourceSnapshot is created when:
DeltaSourceis requested for the snapshot of a delta table at a given version
initialFiles Dataset (of IndexedFiles)¶
initialFiles: Dataset[IndexedFile]
Dataset of Indexed AddFiles¶
initialFiles requests the Snapshot for all AddFiles (in the snapshot) (Dataset[AddFile]).
initialFiles sorts the AddFile dataset (Dataset[AddFile]) by modificationTime and path in ascending order.
initialFiles indexes the AddFiles (using RDD.zipWithIndex operator) that gives a RDD[(AddFile, Long)].
initialFiles converts the RDD to a DataFrame of two columns: add and index.
initialFiles adds the two new columns:
- version
isLastasfalseliteral
initialFiles converts (projects) DataFrame to Dataset[IndexedFile].
Creating CachedDS¶
initialFiles caches the Dataset[IndexedFile] under the following name (with the version and the redactedPath of this Snapshot):
Delta Source Snapshot #[version] - [redactedPath]
Cached Dataset of Indexed AddFiles¶
In the end, initialFiles requests the CachedDS to getDS.
Usage¶
initialFiles is used when:
SnapshotIteratoris requested for the AddFiles
Closing¶
close(
unpersistSnapshot: Boolean): Unit
close is part of the SnapshotIterator abstraction.
close requests the Snapshot to uncache when the given unpersistSnapshot flag is enabled.