DeltaLogFileIndex¶
DeltaLogFileIndex is a FileIndex (Spark SQL) for Snapshot (for the commit and checkpoint files).
Creating Instance¶
DeltaLogFileIndex takes the following to be created:
- FileFormat
- Files (as Hadoop FileStatuses)
While being created, DeltaLogFileIndex prints out the following INFO message to the logs:
Created [this]
DeltaLogFileIndex is created (indirectly using apply utility) when Snapshot is requested for DeltaLogFileIndex for commit or checkpoint files.
FileFormat¶
DeltaLogFileIndex is given a FileFormat (Spark SQL) when created:
Text Representation¶
toString: String
toString returns the following (using the given FileFormat, the number of files and their estimated size):
DeltaLogFileIndex([format], numFilesInSegment: [files], totalFileSize: [sizeInBytes])
Creating DeltaLogFileIndex¶
apply(
format: FileFormat,
files: Seq[FileStatus]): Option[DeltaLogFileIndex]
apply creates a new DeltaLogFileIndex (for a non-empty collection of files).
apply is used when Snapshot is requested for DeltaLogFileIndex for commit or checkpoint files.
Logging¶
Enable ALL logging level for org.apache.spark.sql.delta.DeltaLogFileIndex logger to see what happens inside.
Add the following line to conf/log4j.properties:
log4j.logger.org.apache.spark.sql.delta.DeltaLogFileIndex=ALL
Refer to Logging.