InMemoryFileIndex¶
InMemoryFileIndex
is a PartitioningAwareFileIndex.
Creating Instance¶
InMemoryFileIndex
takes the following to be created:
- SparkSession
- Root Paths (as Hadoop Paths)
- Parameters (
Map[String, String]
) - User-Defined Schema (
Option[StructType]
) - FileStatusCache
- User-Defined Partition Spec (default:
undefined
) -
metadataOpsTimeNs
(Option[Long]
, default:undefined
)
While being created, InMemoryFileIndex
refresh0.
InMemoryFileIndex
is created when:
HiveMetastoreCatalog
is requested to inferIfNeededCatalogFileIndex
is requested for the partitions by the given predicate expressions for a non-partitioned Hive tableDataSource
is requested to createInMemoryFileIndexFileTable
is requested for a PartitioningAwareFileIndex
FileStatusCache¶
InMemoryFileIndex
can be given a FileStatusCache. Unless given, InMemoryFileIndex
uses the NoopCache
.
FileStatusCache
is given (based on the configuration properties) when:
CatalogFileIndex
is requested to filter the partitionsDataSource
is requested to create an InMemoryFileIndexFileTable
is requested for the PartitioningAwareFileIndex
Refreshing Cached File Listings¶
refresh(): Unit
refresh
requests the FileStatusCache to invalidateAll
and then refresh0.
refresh
is part of the FileIndex abstraction.
Refreshing Cached File Listings (Internal)¶
refresh0(): Unit
refresh0
...FIXME
refresh0
is used when InMemoryFileIndex
is created and requested to refresh.
Root Paths¶
rootPaths: Seq[Path]
The root paths with streaming metadata directories and files filtered out (e.g. _spark_metadata
streaming metadata directories).
rootPaths
is part of the FileIndex abstraction.