FileStatusCache¶
FileStatusCache
is an abstraction of Spark application-wide FileStatus Caches for Partition File Metadata Caching.
FileStatusCache
is created using FileStatusCache.getOrCreate factory.
FileStatusCache
is used to create an InMemoryFileIndex.
Contract¶
getLeafFiles¶
getLeafFiles(
path: Path): Option[Array[FileStatus]]
Default: None
(undefined)
See:
Used when:
InMemoryFileIndex
is requested to listLeafFiles
invalidateAll¶
invalidateAll(): Unit
See:
Used when:
putLeafFiles¶
putLeafFiles(
path: Path,
leafFiles: Array[FileStatus]): Unit
See:
Used when:
InMemoryFileIndex
is requested to listLeafFiles
Implementations¶
NoopCache
- SharedInMemoryCache
Looking Up FileStatusCache¶
getOrCreate(
session: SparkSession): FileStatusCache
getOrCreate
creates a SharedInMemoryCache when all the following hold:
- spark.sql.hive.manageFilesourcePartitions is enabled
- spark.sql.hive.filesourcePartitionFileCacheSize is greater than
0
getOrCreate
requests the SharedInMemoryCache
to createForNewClient.
Otherwise, getOrCreate
returns the NoopCache
(that does no caching).
getOrCreate
is used when:
CatalogFileIndex
is requested for the FileStatusCacheDataSource
is requested to create an InMemoryFileIndexFileTable
is requested for the PartitioningAwareFileIndex (for a non-streaming file-based datasource)