PartitioningAwareFileIndex¶
PartitioningAwareFileIndex is an extension of the FileIndex abstraction for indices that are aware of partitioned tables.
Contract¶
leafDirToChildrenFiles¶
leafDirToChildrenFiles: Map[Path, Array[FileStatus]]
Used for files matching filters, all files and infer partitioning
Leaf Files¶
leafFiles: mutable.LinkedHashMap[Path, FileStatus]
Used for all files and base locations
PartitionSpec¶
partitionSpec(): PartitionSpec
Partition specification with partition columns and values, and directories (as Hadoop Paths)
Used for a partition schema, to list the files matching filters and all files
Implementations¶
- InMemoryFileIndex
MetadataLogFileIndex(Spark Structured Streaming)
Creating Instance¶
PartitioningAwareFileIndex takes the following to be created:
- SparkSession
- Options for partition discovery (
Map[String, String]) - Optional User-Defined Schema
-
FileStatusCache(default:NoopCache)
Abstract Class
PartitioningAwareFileIndex is an abstract class and cannot be created directly. It is created indirectly for the concrete PartitioningAwareFileIndexes.
All Files¶
allFiles(): Seq[FileStatus]
allFiles...FIXME
allFiles is used when:
DataSourceis requested to getOrInferFileFormatSchema and resolveRelationPartitioningAwareFileIndexis requested for files matching filters, input files, and sizeFileTableis requested for a data schema
Files Matching Filters¶
listFiles(
partitionFilters: Seq[Expression],
dataFilters: Seq[Expression]): Seq[PartitionDirectory]
listFiles is part of the FileIndex abstraction.
listFiles...FIXME
Partition Schema¶
partitionSchema: StructType
partitionSchema is part of the FileIndex abstraction.
partitionSchema gives the partitionColumns of the partition specification.
Input Files¶
inputFiles: Array[String]
inputFiles is part of the FileIndex abstraction.
inputFiles requests all the files for their location (as Hadoop Paths converted to Strings).
Size¶
sizeInBytes: Long
sizeInBytes is part of the FileIndex abstraction.
sizeInBytes sums up the length (in bytes) of all the files.
Inferring Partitioning¶
inferPartitioning(): PartitionSpec
inferPartitioning...FIXME
inferPartitioning is used by the PartitioningAwareFileIndices.
Base Locations¶
basePaths: Set[Path]
basePaths is used to infer partitioning.
basePaths...FIXME