Skip to content

TahoeBatchFileIndex

TahoeBatchFileIndex is a TahoeFileIndexWithSnapshotDescriptor of a delta table at a given version.

Creating Instance

TahoeBatchFileIndex takes the following to be created:

TahoeBatchFileIndex is created when:

Action Type

TahoeBatchFileIndex is given an Action Type identifier when created:

Important

Action Type seems not to be used ever.

tableVersion

tableVersion: Long

tableVersion is part of the TahoeFileIndex abstraction.

tableVersion is always the version of the Snapshot.

matchingFiles

matchingFiles(
  partitionFilters: Seq[Expression],
  dataFilters: Seq[Expression],
  keepStats: Boolean = false): Seq[AddFile]

matchingFiles is part of the TahoeFileIndex abstraction.

matchingFiles filterFileList (that gives a DataFrame) and collects the AddFiles (using Dataset.collect).

Input Files

inputFiles: Array[String]

inputFiles is part of the FileIndex (Spark SQL) abstraction.

inputFiles returns the paths of all the given AddFiles.

Partitions

partitionSchema: StructType

partitionSchema is part of the FileIndex (Spark SQL) abstraction.

partitionSchema requests the Snapshot for the metadata that is in turn requested for the partitionSchema.

Estimated Size of Relation

sizeInBytes: Long

sizeInBytes is part of the FileIndex (Spark SQL) abstraction.

sizeInBytes is a sum of the sizes of all the given AddFiles.