TahoeBatchFileIndex¶
TahoeBatchFileIndex is a TahoeFileIndexWithSnapshotDescriptor of a delta table at a given version.
Creating Instance¶
TahoeBatchFileIndex takes the following to be created:
TahoeBatchFileIndex is created when:
DeltaLogis requested for a DataFrame for given AddFiles- DeleteCommand and UpdateCommand are executed (and
DeltaCommandis requested for a HadoopFsRelation)
Action Type¶
TahoeBatchFileIndex is given an Action Type identifier when created:
- batch or streaming when
DeltaLogis requested for a batch or streaming DataFrame for given AddFiles, respectively - delete for DeleteCommand
- update for UpdateCommand
Important
Action Type seems not to be used ever.
tableVersion¶
tableVersion: Long
tableVersion is part of the TahoeFileIndex abstraction.
tableVersion is always the version of the Snapshot.
matchingFiles¶
matchingFiles(
partitionFilters: Seq[Expression],
dataFilters: Seq[Expression],
keepStats: Boolean = false): Seq[AddFile]
matchingFiles is part of the TahoeFileIndex abstraction.
matchingFiles filterFileList (that gives a DataFrame) and collects the AddFiles (using Dataset.collect).
Input Files¶
inputFiles: Array[String]
inputFiles is part of the FileIndex (Spark SQL) abstraction.
inputFiles returns the paths of all the given AddFiles.
Partitions¶
partitionSchema: StructType
partitionSchema is part of the FileIndex (Spark SQL) abstraction.
partitionSchema requests the Snapshot for the metadata that is in turn requested for the partitionSchema.
Estimated Size of Relation¶
sizeInBytes: Long
sizeInBytes is part of the FileIndex (Spark SQL) abstraction.
sizeInBytes is a sum of the sizes of all the given AddFiles.