DeltaLogFileIndex¶
DeltaLogFileIndex
is a FileIndex
(Spark SQL) for Snapshot (for the commit and checkpoint files).
Creating Instance¶
DeltaLogFileIndex
takes the following to be created:
- FileFormat
- Files (as Hadoop FileStatuses)
While being created, DeltaLogFileIndex
prints out the following INFO message to the logs:
Created [this]
DeltaLogFileIndex
is created (indirectly using apply utility) when Snapshot
is requested for DeltaLogFileIndex
for commit or checkpoint files.
FileFormat¶
DeltaLogFileIndex
is given a FileFormat
(Spark SQL) when created:
Text Representation¶
toString: String
toString
returns the following (using the given FileFormat, the number of files and their estimated size):
DeltaLogFileIndex([format], numFilesInSegment: [files], totalFileSize: [sizeInBytes])
Creating DeltaLogFileIndex¶
apply(
format: FileFormat,
files: Seq[FileStatus]): Option[DeltaLogFileIndex]
apply
creates a new DeltaLogFileIndex
(for a non-empty collection of files).
apply
is used when Snapshot
is requested for DeltaLogFileIndex
for commit or checkpoint files.
Logging¶
Enable ALL
logging level for org.apache.spark.sql.delta.DeltaLogFileIndex
logger to see what happens inside.
Add the following line to conf/log4j.properties
:
log4j.logger.org.apache.spark.sql.delta.DeltaLogFileIndex=ALL
Refer to Logging.