DeltaSourceSnapshot¶
DeltaSourceSnapshot
is a SnapshotIterator and a StateCache for DeltaSource.
Creating Instance¶
DeltaSourceSnapshot
takes the following to be created:
DeltaSourceSnapshot
is created when:
DeltaSource
is requested for the snapshot of a delta table at a given version
initialFiles Dataset (of IndexedFiles)¶
initialFiles: Dataset[IndexedFile]
Dataset of Indexed AddFiles¶
initialFiles
requests the Snapshot for all AddFiles (in the snapshot) (Dataset[AddFile]
).
initialFiles
sorts the AddFile dataset (Dataset[AddFile]
) by modificationTime and path in ascending order.
initialFiles
indexes the AddFiles
(using RDD.zipWithIndex
operator) that gives a RDD[(AddFile, Long)]
.
initialFiles
converts the RDD
to a DataFrame
of two columns: add
and index
.
initialFiles
adds the two new columns:
- version
isLast
asfalse
literal
initialFiles
converts (projects) DataFrame
to Dataset[IndexedFile]
.
Creating CachedDS¶
initialFiles
caches the Dataset[IndexedFile]
under the following name (with the version and the redactedPath of this Snapshot):
Delta Source Snapshot #[version] - [redactedPath]
Cached Dataset of Indexed AddFiles¶
In the end, initialFiles
requests the CachedDS to getDS.
Usage¶
initialFiles
is used when:
SnapshotIterator
is requested for the AddFiles
Closing¶
close(
unpersistSnapshot: Boolean): Unit
close
is part of the SnapshotIterator abstraction.
close
requests the Snapshot to uncache when the given unpersistSnapshot
flag is enabled.