ReliableRDDCheckpointData

ReliableRDDCheckpointData is a RDDCheckpointData for Reliable Checkpointing.

Creating Instance

ReliableRDDCheckpointData takes the following to be created:

ReliableRDDCheckpointData is created for RDD.checkpoint operator.

Checkpoint Directory

ReliableRDDCheckpointData creates a subdirectory of the application-wide checkpoint directory for checkpointing the given RDD.

The name of the subdirectory uses the unique identifier of the RDD:

rdd-[id]

Checkpointing RDD

doCheckpoint(): CheckpointRDD[T]

doCheckpoint writes the RDD to the checkpoint directory (that creates a new RDD).

With spark.cleaner.referenceTracking.cleanCheckpoints configuration property enabled, doCheckpoint requests the ContextCleaner to registerRDDCheckpointDataForCleanup for the new RDD.

In the end, doCheckpoint prints out the following INFO message to the logs and returns the new RDD.

Done checkpointing RDD [id] to [cpDir], new parent is RDD [id]

doCheckpoint is part of the RDDCheckpointData abstraction.