Skip to content


PartitioningAwareFileIndex is an extension of the FileIndex abstraction for indices that are aware of partitioned tables.



leafDirToChildrenFiles: Map[Path, Array[FileStatus]]

Used for files matching filters, all files and infer partitioning

Leaf Files

leafFiles: mutable.LinkedHashMap[Path, FileStatus]

Used for all files and base locations


partitionSpec(): PartitionSpec

Partition specification with partition columns and values, and directories (as Hadoop Paths)

Used for a partition schema, to list the files matching filters and all files


Creating Instance

PartitioningAwareFileIndex takes the following to be created:

  • SparkSession
  • Options for partition discovery (Map[String, String])
  • Optional User-Defined Schema
  • FileStatusCache (default: NoopCache)
Abstract Class

PartitioningAwareFileIndex is an abstract class and cannot be created directly. It is created indirectly for the concrete PartitioningAwareFileIndexes.

All Files

allFiles(): Seq[FileStatus]


allFiles is used when:

Files Matching Filters

  partitionFilters: Seq[Expression],
  dataFilters: Seq[Expression]): Seq[PartitionDirectory]


listFiles is part of the FileIndex abstraction.

Partition Schema

partitionSchema: StructType

partitionSchema gives the partitionColumns of the partition specification.

partitionSchema is part of the FileIndex abstraction.

Input Files

inputFiles: Array[String]

inputFiles requests all the files for their location (as Hadoop Paths converted to Strings).

inputFiles is part of the FileIndex abstraction.


sizeInBytes: Long

sizeInBytes sums up the length (in bytes) of all the files.

sizeInBytes is part of the FileIndex abstraction.

Inferring Partitioning

inferPartitioning(): PartitionSpec


inferPartitioning is used by the PartitioningAwareFileIndexes.

Base Locations

basePaths: Set[Path]


basePaths is used to infer partitioning.

Back to top