Skip to content

TahoeBatchFileIndex

TahoeBatchFileIndex is a file index of a delta table at a given version.

Creating Instance

TahoeBatchFileIndex takes the following to be created:

TahoeBatchFileIndex is created when:

Action Type

TahoeBatchFileIndex is given an Action Type identifier when created:

Important

Action Type seems not to be used ever.

tableVersion

tableVersion: Long

tableVersion is always the version of the Snapshot.

tableVersion is part of the TahoeFileIndex abstraction.

matchingFiles

matchingFiles(
  partitionFilters: Seq[Expression],
  dataFilters: Seq[Expression],
  keepStats: Boolean = false): Seq[AddFile]

matchingFiles filterFileList (that gives a DataFrame) and collects the AddFiles (using Dataset.collect).

matchingFiles is part of the TahoeFileIndex abstraction.

Input Files

inputFiles: Array[String]

inputFiles returns the paths of all the given AddFiles.

inputFiles is part of the FileIndex abstraction (Spark SQL).

Partitions

partitionSchema: StructType

partitionSchema requests the Snapshot for the metadata that is in turn requested for the partitionSchema.

partitionSchema is part of the FileIndex abstraction (Spark SQL).

Estimated Size of Relation

sizeInBytes: Long

sizeInBytes is a sum of the sizes of all the given AddFiles.

sizeInBytes is part of the FileIndex abstraction (Spark SQL).


Last update: 2020-10-05