Skip to content

CatalogFileIndex

CatalogFileIndex is a FileIndex.

Creating Instance

CatalogFileIndex takes the following to be created:

CatalogFileIndex is created when:

FileStatusCache

CatalogFileIndex creates a FileStatusCache when created.

The FileStatusCache is used when:

Listing Files

listFiles(
  partitionFilters: Seq[Expression],
  dataFilters: Seq[Expression]): Seq[PartitionDirectory]

listFiles lists the partitions for the input partition filters and then requests them for the underlying partition files.

listFiles is part of the FileIndex abstraction.

Input Files

inputFiles: Array[String]

inputFiles lists all the partitions and then requests them for the input files.

inputFiles is part of the FileIndex abstraction.

Root Paths

rootPaths: Seq[Path]

rootPaths returns the base location converted to a Hadoop Path.

rootPaths is part of the FileIndex abstraction.

Listing Partitions By Given Predicate Expressions

filterPartitions(
  filters: Seq[Expression]): InMemoryFileIndex

filterPartitions requests the CatalogTable for the partition columns.

For a partitioned table, filterPartitions starts tracking time. filterPartitions requests the SessionCatalog for the partitions by filter and creates a PrunedInMemoryFileIndex (with the partition listing time).

For an unpartitioned table (no partition columns defined), filterPartitions simply returns a InMemoryFileIndex (with the base location and no user-specified schema).

filterPartitions is used when:

Internal Properties

Base Location

Base location (as a Java URI) as defined in the CatalogTable metadata (under the locationUri of the storage)

Used when CatalogFileIndex is requested to filter the partitions and for the root paths

Hadoop Configuration

Hadoop Configuration

Used when CatalogFileIndex is requested to filter the partitions