Skip to content


ReliableRDDCheckpointData is a RDDCheckpointData for Reliable Checkpointing.

Creating Instance

ReliableRDDCheckpointData takes the following to be created:

  • [[rdd]][++RDD[T]++]

ReliableRDDCheckpointData is created for[RDD.checkpoint] operator.

== [[cpDir]][[checkpointPath]] Checkpoint Directory

ReliableRDDCheckpointData creates a subdirectory of the[application-wide checkpoint directory] for <> the given <>.

The name of the subdirectory uses the[unique identifier] of the <>:



== [[doCheckpoint]] Checkpointing RDD

[source, scala]

doCheckpoint(): CheckpointRDD[T]

doCheckpoint[writes] the <> to the <> (that creates a new RDD).

With[spark.cleaner.referenceTracking.cleanCheckpoints] configuration property enabled, doCheckpoint requests the[ContextCleaner] to[registerRDDCheckpointDataForCleanup] for the new RDD.

In the end, doCheckpoint prints out the following INFO message to the logs and returns the new RDD.


Done checkpointing RDD [id] to [cpDir], new parent is RDD [id]

doCheckpoint is part of the[RDDCheckpointData] abstraction.

Last update: 2020-11-27