Skip to content

DeltaLogFileIndex

DeltaLogFileIndex is a FileIndex for Snapshot (for the commit and checkpoint files).

Note

Learn more on FileIndex in The Internals of Spark SQL online book.

Creating Instance

DeltaLogFileIndex takes the following to be created:

While being created, DeltaLogFileIndex prints out the following INFO message to the logs:

Created [this]

DeltaLogFileIndex is created (indirectly using apply utility) when Snapshot is requested for DeltaLogFileIndex for commit or checkpoint files.

FileFormat

DeltaLogFileIndex is given a FileFormat (Spark SQL) when created:

  • JsonFileFormat (Spark SQL) for commit files
  • ParquetFileFormat (Spark SQL) for checkpoint files

Text Representation

toString: String

toString returns the following (using the given FileFormat, the number of files and their estimated size):

DeltaLogFileIndex([format], numFilesInSegment: [files], totalFileSize: [sizeInBytes])

Creating DeltaLogFileIndex

apply(
  format: FileFormat,
  files: Seq[FileStatus]): Option[DeltaLogFileIndex]

apply creates a new DeltaLogFileIndex (for a non-empty collection of files).

apply is used when Snapshot is requested for DeltaLogFileIndex for commit or checkpoint files.

Logging

Enable ALL logging level for org.apache.spark.sql.delta.DeltaLogFileIndex logger to see what happens inside.

Add the following line to conf/log4j.properties:

log4j.logger.org.apache.spark.sql.delta.DeltaLogFileIndex=ALL

Refer to Logging.


Last update: 2020-10-05