CdcAddFileIndex¶
CdcAddFileIndex is a TahoeBatchFileIndex with the following:
| Property | Value |
|---|---|
| Action Type | cdcRead |
| addFiles | The AddFiles of the given CDCDataSpecs |
CdcAddFileIndex is used by CDCReaderImpl to scanIndex.
Creating Instance¶
CdcAddFileIndex takes the following to be created:
-
SparkSession - AddFiles by Version (
Seq[CDCDataSpec[AddFile]]) - DeltaLog
-
Path - SnapshotDescriptor
- Row Index Filters
CdcAddFileIndex is created when:
CDCReaderImplis requested for the DataFrame with deleted and added rows and to processDeletionVectorActions
Row Index Filters¶
SupportsRowIndexFilters
rowIndexFilters: Option[Map[String, RowIndexFilterType]] = None
rowIndexFilters is part of the SupportsRowIndexFilters abstraction.
CdcAddFileIndex is given Row Index Filters when created.
Input Files¶
inputFiles...FIXME
Matching Files¶
TahoeFileIndex
matchingFiles(
partitionFilters: Seq[Expression],
dataFilters: Seq[Expression]): Seq[AddFile]
matchingFiles is part of the TahoeFileIndex abstraction.
matchingFiles...FIXME
Partitions¶
FileIndex
partitionSchema: StructType
partitionSchema is part of the FileIndex (Spark SQL) abstraction.
partitionSchema cdcReadSchema for the partitions of (the Metadata of) the given SnapshotDescriptor.