TahoeBatchFileIndex¶
TahoeBatchFileIndex
is a TahoeFileIndexWithSnapshotDescriptor of a delta table at a given version.
Creating Instance¶
TahoeBatchFileIndex
takes the following to be created:
TahoeBatchFileIndex
is created when:
DeltaLog
is requested for a DataFrame for given AddFiles- DeleteCommand and UpdateCommand are executed (and
DeltaCommand
is requested for a HadoopFsRelation)
Action Type¶
TahoeBatchFileIndex
is given an Action Type identifier when created:
- batch or streaming when
DeltaLog
is requested for a batch or streaming DataFrame for given AddFiles, respectively - delete for DeleteCommand
- update for UpdateCommand
Important
Action Type seems not to be used ever.
tableVersion¶
tableVersion: Long
tableVersion
is part of the TahoeFileIndex abstraction.
tableVersion
is always the version of the Snapshot.
matchingFiles¶
matchingFiles(
partitionFilters: Seq[Expression],
dataFilters: Seq[Expression],
keepStats: Boolean = false): Seq[AddFile]
matchingFiles
is part of the TahoeFileIndex abstraction.
matchingFiles
filterFileList (that gives a DataFrame
) and collects the AddFiles (using Dataset.collect
).
Input Files¶
inputFiles: Array[String]
inputFiles
is part of the FileIndex
(Spark SQL) abstraction.
inputFiles
returns the paths of all the given AddFiles.
Partitions¶
partitionSchema: StructType
partitionSchema
is part of the FileIndex
(Spark SQL) abstraction.
partitionSchema
requests the Snapshot for the metadata that is in turn requested for the partitionSchema.
Estimated Size of Relation¶
sizeInBytes: Long
sizeInBytes
is part of the FileIndex
(Spark SQL) abstraction.
sizeInBytes
is a sum of the sizes of all the given AddFiles.