FileStatusCache¶
FileStatusCache is an abstraction of Spark application-wide FileStatus Caches for Partition File Metadata Caching.
FileStatusCache is created using FileStatusCache.getOrCreate factory.
FileStatusCache is used to create an InMemoryFileIndex.
Contract¶
getLeafFiles¶
getLeafFiles(
path: Path): Option[Array[FileStatus]]
Default: None (undefined)
See:
Used when:
InMemoryFileIndexis requested to listLeafFiles
invalidateAll¶
invalidateAll(): Unit
See:
Used when:
putLeafFiles¶
putLeafFiles(
path: Path,
leafFiles: Array[FileStatus]): Unit
See:
Used when:
InMemoryFileIndexis requested to listLeafFiles
Implementations¶
NoopCache- SharedInMemoryCache
Looking Up FileStatusCache¶
getOrCreate(
session: SparkSession): FileStatusCache
getOrCreate creates a SharedInMemoryCache when all the following hold:
- spark.sql.hive.manageFilesourcePartitions is enabled
- spark.sql.hive.filesourcePartitionFileCacheSize is greater than
0
getOrCreate requests the SharedInMemoryCache to createForNewClient.
Otherwise, getOrCreate returns the NoopCache (that does no caching).
getOrCreate is used when:
CatalogFileIndexis requested for the FileStatusCacheDataSourceis requested to create an InMemoryFileIndexFileTableis requested for the PartitioningAwareFileIndex (for a non-streaming file-based datasource)