FileStreamSinkLog¶
FileStreamSinkLog
is a CompactibleFileStreamLog (of SinkFileStatuses) for FileStreamSink and MetadataLogFileIndex.
FileStreamSinkLog
concatenates metadata logs to a single compact file after defined compact interval.
Creating Instance¶
FileStreamSinkLog
(like the parent CompactibleFileStreamLog) takes the following to be created:
- Version of the Metadata Log
-
SparkSession
- Path of the Metadata Log
Configuration Properties¶
spark.sql.streaming.fileSink.log.cleanupDelay¶
FileStreamSinkLog
uses spark.sql.streaming.fileSink.log.cleanupDelay configuration property for fileCleanupDelayMs.
spark.sql.streaming.fileSink.log.compactInterval¶
FileStreamSinkLog
uses spark.sql.streaming.fileSink.log.compactInterval configuration property for defaultCompactInterval.
spark.sql.streaming.fileSink.log.deletion¶
FileStreamSinkLog
uses spark.sql.streaming.fileSink.log.deletion configuration property for isDeletingExpiredLog.
Compacting Logs¶
compactLogs(
logs: Seq[SinkFileStatus]): Seq[SinkFileStatus]
compactLogs
finds delete actions in the given collection of SinkFileStatuses.
If there are no deletes, compactLogs
gives the SinkFileStatus
es back (unmodified).
Otherwise, compactLogs
removes the deleted paths from the SinkFileStatus
es.
compactLogs
is part of the CompactibleFileStreamLog abstraction.
Version¶
FileStreamSinkLog
uses 1 for the version.
Actions¶
Add¶
FileStreamSinkLog
uses add action to create new metadata logs.
Delete¶
FileStreamSinkLog
uses delete action to mark status files to be excluded from compaction.
Important
Delete action is not used in Spark Structured Streaming and will be removed in 3.1.0.