Skip to content

CommitLog — HDFSMetadataLog for Offset Commit Log

CommitLog is an HDFSMetadataLog with CommitMetadata metadata.

CommitLog is the offset commit log of streaming query execution engines.

[[CommitMetadata]][[nextBatchWatermarkMs]] CommitLog uses CommitMetadata for the metadata with nextBatchWatermarkMs attribute (of type Long and the default 0).

CommitLog <> commit metadata to files with names that are offsets.

$ ls -tr [checkpoint-directory]/commits
0 1 2 3 4 5 6 7 8 9

$ cat [checkpoint-directory]/commits/8
{"nextBatchWatermarkMs": 0}

[[VERSION]] CommitLog uses 1 for the version.

[[creating-instance]] CommitLog (like the parent HDFSMetadataLog) takes the following to be created:

  • [[sparkSession]] SparkSession
  • [[path]] Path of the metadata log directory

=== [[serialize]] Serializing Metadata (Writing Metadata to Persistent Storage) -- serialize Method

[source, scala]

serialize( metadata: CommitMetadata, out: OutputStream): Unit

serialize writes out the <> prefixed with v on a single line (e.g. v1) followed by the given CommitMetadata in JSON format.

serialize is part of HDFSMetadataLog abstraction.

=== [[deserialize]] Deserializing Metadata -- deserialize Method

[source, scala]

deserialize(in: InputStream): CommitMetadata

deserialize simply reads (deserializes) two lines from the given InputStream for version and the <> attribute.

deserialize is part of HDFSMetadataLog abstraction.

=== [[add-batchId]] add Method

[source, scala]

add(batchId: Long): Unit


NOTE: add is used when...FIXME

=== [[add-batchId-metadata]] add Method

[source, scala]

add(batchId: Long, metadata: String): Boolean


add is part of MetadataLog abstraction.