InMemoryFileIndex¶
InMemoryFileIndex is a PartitioningAwareFileIndex.
Creating Instance¶
InMemoryFileIndex takes the following to be created:
- SparkSession
- Root Paths (as Hadoop Paths)
- Parameters (
Map[String, String]) - User-Defined Schema (
Option[StructType]) - FileStatusCache
- User-Defined Partition Spec (default:
undefined) -
metadataOpsTimeNs(Option[Long], default:undefined)
While being created, InMemoryFileIndex refresh0.
InMemoryFileIndex is created when:
HiveMetastoreCatalogis requested to inferIfNeededCatalogFileIndexis requested for the partitions by the given predicate expressions for a non-partitioned Hive tableDataSourceis requested to createInMemoryFileIndexFileTableis requested for a PartitioningAwareFileIndex
FileStatusCache¶
InMemoryFileIndex can be given a FileStatusCache. Unless given, InMemoryFileIndex uses the NoopCache.
FileStatusCache is given (based on the configuration properties) when:
CatalogFileIndexis requested to filter the partitionsDataSourceis requested to create an InMemoryFileIndexFileTableis requested for the PartitioningAwareFileIndex
Refreshing Cached File Listings¶
refresh(): Unit
refresh requests the FileStatusCache to invalidateAll and then refresh0.
refresh is part of the FileIndex abstraction.
Refreshing Cached File Listings (Internal)¶
refresh0(): Unit
refresh0...FIXME
refresh0 is used when InMemoryFileIndex is created and requested to refresh.
Root Paths¶
rootPaths: Seq[Path]
The root paths with streaming metadata directories and files filtered out (e.g. _spark_metadata streaming metadata directories).
rootPaths is part of the FileIndex abstraction.