FileStreamSinkLog¶
FileStreamSinkLog is a CompactibleFileStreamLog (of SinkFileStatuses) for FileStreamSink and MetadataLogFileIndex.
FileStreamSinkLog concatenates metadata logs to a single compact file after defined compact interval.
Creating Instance¶
FileStreamSinkLog (like the parent CompactibleFileStreamLog) takes the following to be created:
- Version of the Metadata Log
-
SparkSession - Path of the Metadata Log
Configuration Properties¶
spark.sql.streaming.fileSink.log.cleanupDelay¶
FileStreamSinkLog uses spark.sql.streaming.fileSink.log.cleanupDelay configuration property for fileCleanupDelayMs.
spark.sql.streaming.fileSink.log.compactInterval¶
FileStreamSinkLog uses spark.sql.streaming.fileSink.log.compactInterval configuration property for defaultCompactInterval.
spark.sql.streaming.fileSink.log.deletion¶
FileStreamSinkLog uses spark.sql.streaming.fileSink.log.deletion configuration property for isDeletingExpiredLog.
Compacting Logs¶
compactLogs(
logs: Seq[SinkFileStatus]): Seq[SinkFileStatus]
compactLogs finds delete actions in the given collection of SinkFileStatuses.
If there are no deletes, compactLogs gives the SinkFileStatuses back (unmodified).
Otherwise, compactLogs removes the deleted paths from the SinkFileStatuses.
compactLogs is part of the CompactibleFileStreamLog abstraction.
Version¶
FileStreamSinkLog uses 1 for the version.
Actions¶
Add¶
FileStreamSinkLog uses add action to create new metadata logs.
Delete¶
FileStreamSinkLog uses delete action to mark status files to be excluded from compaction.
Important
Delete action is not used in Spark Structured Streaming and will be removed in 3.1.0.