Skip to content

PartitioningAwareFileIndex

PartitioningAwareFileIndex is an extension of the FileIndex abstraction for indices that are aware of partitioned tables.

Contract

leafDirToChildrenFiles

leafDirToChildrenFiles: Map[Path, Array[FileStatus]]

Used for files matching filters, all files and infer partitioning

Leaf Files

leafFiles: mutable.LinkedHashMap[Path, FileStatus]

Used for all files and base locations

PartitionSpec

partitionSpec(): PartitionSpec

Partition specification with partition columns and values, and directories (as Hadoop Paths)

Used for a partition schema, to list the files matching filters and all files

Implementations

Creating Instance

PartitioningAwareFileIndex takes the following to be created:

  • SparkSession
  • Options for partition discovery (Map[String, String])
  • Optional User-Defined Schema
  • FileStatusCache (default: NoopCache)
Abstract Class

PartitioningAwareFileIndex is an abstract class and cannot be created directly. It is created indirectly for the concrete PartitioningAwareFileIndexes.

All Files

allFiles(): Seq[FileStatus]

allFiles...FIXME


allFiles is used when:

Files Matching Filters

listFiles(
  partitionFilters: Seq[Expression],
  dataFilters: Seq[Expression]): Seq[PartitionDirectory]

listFiles is part of the FileIndex abstraction.


listFiles...FIXME

Partition Schema

partitionSchema: StructType

partitionSchema is part of the FileIndex abstraction.


partitionSchema gives the partitionColumns of the partition specification.

Input Files

inputFiles: Array[String]

inputFiles is part of the FileIndex abstraction.


inputFiles requests all the files for their location (as Hadoop Paths converted to Strings).

Size

sizeInBytes: Long

sizeInBytes is part of the FileIndex abstraction.


sizeInBytes sums up the length (in bytes) of all the files.

Inferring Partitioning

inferPartitioning(): PartitionSpec

inferPartitioning...FIXME


inferPartitioning is used by the PartitioningAwareFileIndices.

Base Locations

basePaths: Set[Path]

basePaths is used to infer partitioning.

basePaths...FIXME