Skip to content

CheckpointFileManager

CheckpointFileManager is an abstraction of checkpoint managers that manage checkpoint files (metadata of streaming batches) on Hadoop DFS-compatible file systems.

CheckpointFileManager is created based on spark.sql.streaming.checkpointFileManagerClass configuration property if defined before reverting to the available checkpoint managers.

Contract (Subset)

createAtomic

createAtomic(
  path: Path,
  overwriteIfPossible: Boolean): CancellableFSDataOutputStream

Used when:

createCheckpointDirectory

createCheckpointDirectory(): Path

Creates the checkpoint path

Used when:

  • ResolveWriteToStream is requested to resolveCheckpointLocation

Implementations

Creating CheckpointFileManager

create(
  path: Path,
  hadoopConf: Configuration): CheckpointFileManager

create uses spark.sql.streaming.checkpointFileManagerClass as the name of the implementation to instantiate.

If undefined, create creates a FileContextBasedCheckpointFileManager first, and, in case of an UnsupportedFileSystemException, falls back to a FileSystemBasedCheckpointFileManager.

create prints out the following WARN message to the logs if UnsupportedFileSystemException happens:

Could not use FileContext API for managing Structured Streaming checkpoint files at [path].
Using FileSystem API instead for managing log files.
If the implementation of FileSystem.rename() is not atomic, then the correctness and fault-tolerance of your Structured Streaming is not guaranteed.

create is used when: