Skip to content

InMemoryFileIndex

InMemoryFileIndex is a PartitioningAwareFileIndex.

Creating Instance

InMemoryFileIndex takes the following to be created:

  • SparkSession
  • Root Paths (as Hadoop Paths)
  • Parameters (Map[String, String])
  • User-Defined Schema (Option[StructType])
  • FileStatusCache
  • User-Defined Partition Spec (default: undefined)
  • metadataOpsTimeNs (Option[Long], default: undefined)

While being created, InMemoryFileIndex refresh0.

InMemoryFileIndex is created when:

FileStatusCache

InMemoryFileIndex can be given a FileStatusCache. Unless given, InMemoryFileIndex uses the NoopCache.

FileStatusCache is given (based on the configuration properties) when:

Refreshing Cached File Listings

refresh(): Unit

refresh requests the FileStatusCache to invalidateAll and then refresh0.

refresh is part of the FileIndex abstraction.

Refreshing Cached File Listings (Internal)

refresh0(): Unit

refresh0...FIXME

refresh0 is used when InMemoryFileIndex is created and requested to refresh.

Root Paths

rootPaths: Seq[Path]

The root paths with streaming metadata directories and files filtered out (e.g. _spark_metadata streaming metadata directories).

rootPaths is part of the FileIndex abstraction.