CdcAddFileIndex¶
CdcAddFileIndex
is a TahoeBatchFileIndex with the following:
Property | Value |
---|---|
Action Type | cdcRead |
addFiles | The AddFiles of the given CDCDataSpecs |
CdcAddFileIndex
is used by CDCReaderImpl to scanIndex.
Creating Instance¶
CdcAddFileIndex
takes the following to be created:
-
SparkSession
- AddFiles by Version (
Seq[CDCDataSpec[AddFile]]
) - DeltaLog
-
Path
- SnapshotDescriptor
- Row Index Filters
CdcAddFileIndex
is created when:
CDCReaderImpl
is requested for the DataFrame with deleted and added rows and to processDeletionVectorActions
Row Index Filters¶
SupportsRowIndexFilters
rowIndexFilters: Option[Map[String, RowIndexFilterType]] = None
rowIndexFilters
is part of the SupportsRowIndexFilters abstraction.
CdcAddFileIndex
is given Row Index Filters when created.
Input Files¶
inputFiles
...FIXME
Matching Files¶
TahoeFileIndex
matchingFiles(
partitionFilters: Seq[Expression],
dataFilters: Seq[Expression]): Seq[AddFile]
matchingFiles
is part of the TahoeFileIndex abstraction.
matchingFiles
...FIXME
Partitions¶
FileIndex
partitionSchema: StructType
partitionSchema
is part of the FileIndex
(Spark SQL) abstraction.
partitionSchema
cdcReadSchema for the partitions of (the Metadata of) the given SnapshotDescriptor.