StateCache¶
StateCache
is an abstraction of state caches that can cache a Dataset and uncache them all.
Contract¶
SparkSession¶
spark: SparkSession
SparkSession
the cached RDDs belong to
Implementations¶
Cached RDDs¶
cached: ArrayBuffer[RDD[_]]
StateCache
tracks cached RDDs in cached
internal registry.
cached
is given a new RDD
when StateCache
is requested to cache a Dataset.
cached
is used when StateCache
is requested to get a cached Dataset and uncache.
Caching Dataset¶
cacheDS[A](
ds: Dataset[A],
name: String): CachedDS[A]
cacheDS
creates a new CachedDS.
cacheDS
is used when:
Snapshot
is requested for the cachedStateDeltaSourceSnapshot
is requested for the initialFilesDataSkippingReaderBase
is requested for the withStatsCache
Uncaching All Cached Datasets¶
uncache[A](
ds: Dataset[A],
name: String): CachedDS[A]
uncache
uses the isCached internal flag to avoid multiple executions.
uncache
is used when:
DeltaLog
utility is used to access deltaLogCache and a cached entry expiresSnapshotManagement
is requested to update state of a Delta tableDeltaSourceSnapshot
is requested to close