StateCache¶
StateCache is an abstraction of state caches that can cache a Dataset and uncache them all.
Contract¶
SparkSession¶
spark: SparkSession
SparkSession the cached RDDs belong to
Implementations¶
Cached RDDs¶
cached: ArrayBuffer[RDD[_]]
StateCache tracks cached RDDs in cached internal registry.
cached is given a new RDD when StateCache is requested to cache a Dataset.
cached is used when StateCache is requested to get a cached Dataset and uncache.
Caching Dataset¶
cacheDS[A](
ds: Dataset[A],
name: String): CachedDS[A]
cacheDS creates a new CachedDS.
cacheDS is used when:
Snapshotis requested for the cachedStateDeltaSourceSnapshotis requested for the initialFilesDataSkippingReaderBaseis requested for the withStatsCache
Uncaching All Cached Datasets¶
uncache[A](
ds: Dataset[A],
name: String): CachedDS[A]
uncache uses the isCached internal flag to avoid multiple executions.
uncache is used when:
DeltaLogutility is used to access deltaLogCache and a cached entry expiresSnapshotManagementis requested to update state of a Delta tableDeltaSourceSnapshotis requested to close