PartitioningAwareFileIndex¶
PartitioningAwareFileIndex
is an extension of the FileIndex abstraction for indices that are aware of partitioned tables.
Contract¶
leafDirToChildrenFiles¶
leafDirToChildrenFiles: Map[Path, Array[FileStatus]]
Used for files matching filters, all files and infer partitioning
Leaf Files¶
leafFiles: mutable.LinkedHashMap[Path, FileStatus]
Used for all files and base locations
PartitionSpec¶
partitionSpec(): PartitionSpec
Partition specification with partition columns and values, and directories (as Hadoop Paths)
Used for a partition schema, to list the files matching filters and all files
Implementations¶
- InMemoryFileIndex
MetadataLogFileIndex
(Spark Structured Streaming)
Creating Instance¶
PartitioningAwareFileIndex
takes the following to be created:
- SparkSession
- Options for partition discovery (
Map[String, String]
) - Optional User-Defined Schema
-
FileStatusCache
(default:NoopCache
)
Abstract Class
PartitioningAwareFileIndex
is an abstract class and cannot be created directly. It is created indirectly for the concrete PartitioningAwareFileIndexes.
All Files¶
allFiles(): Seq[FileStatus]
allFiles
...FIXME
allFiles
is used when:
DataSource
is requested to getOrInferFileFormatSchema and resolveRelationPartitioningAwareFileIndex
is requested for files matching filters, input files, and sizeFileTable
is requested for a data schema
Files Matching Filters¶
listFiles(
partitionFilters: Seq[Expression],
dataFilters: Seq[Expression]): Seq[PartitionDirectory]
listFiles
is part of the FileIndex abstraction.
listFiles
...FIXME
Partition Schema¶
partitionSchema: StructType
partitionSchema
is part of the FileIndex abstraction.
partitionSchema
gives the partitionColumns
of the partition specification.
Input Files¶
inputFiles: Array[String]
inputFiles
is part of the FileIndex abstraction.
inputFiles
requests all the files for their location (as Hadoop Paths converted to String
s).
Size¶
sizeInBytes: Long
sizeInBytes
is part of the FileIndex abstraction.
sizeInBytes
sums up the length (in bytes) of all the files.
Inferring Partitioning¶
inferPartitioning(): PartitionSpec
inferPartitioning
...FIXME
inferPartitioning
is used by the PartitioningAwareFileIndices.
Base Locations¶
basePaths: Set[Path]
basePaths
is used to infer partitioning.
basePaths
...FIXME