Skip to content

ContinuousWriteRDD -- RDD of WriteToContinuousDataSourceExec Unary Physical Operator

ContinuousWriteRDD is a specialized RDD (RDD[Unit]) that is used exclusively as the underlying RDD of WriteToContinuousDataSourceExec unary physical operator to <>.

ContinuousWriteRDD is <> exclusively when WriteToContinuousDataSourceExec unary physical operator is requested to <>.

[[partitioner]] [[getPartitions]] ContinuousWriteRDD uses the <> for the partitions and the partitioner.

[[creating-instance]] ContinuousWriteRDD takes the following to be created:

  • [[prev]] Parent RDD (RDD[InternalRow])
  • [[writeTask]] Write task (DataWriterFactory[InternalRow])

=== [[compute]] Computing Partition -- compute Method

[source, scala]

compute( split: Partition, context: TaskContext): Iterator[Unit]


NOTE: compute is part of the RDD Contract to compute a partition.

compute requests the EpochCoordinatorRef helper for a <> (using the <>).

NOTE: The <> runs on the driver as the single point to coordinate epochs across partition tasks.

compute uses the EpochTracker helper to <> (using the <> local property).

[[compute-loop]] compute then executes the following steps (in a loop) until the task (as the given TaskContext) is killed or completed.

compute requests the <> to compute the given partition (that gives an Iterator[InternalRow]).

compute requests the <> to create a DataWriter (for the partition and the task attempt IDs from the given TaskContext and the <> from the EpochTracker helper) and requests it to write all records (from the Iterator[InternalRow]).

compute prints out the following INFO message to the logs:

Writer for partition [partitionId] in epoch [epoch] is committing.

compute requests the DataWriter to commit (that gives a WriterCommitMessage).

compute requests the EpochCoordinator RPC endpoint reference to send out a <> message (with the WriterCommitMessage).

compute prints out the following INFO message to the logs:

Writer for partition [partitionId] in epoch [epoch] is committed.

In the end (of the loop), compute uses the EpochTracker helper to <>.

In case of an error, compute prints out the following ERROR message to the logs and requests the DataWriter to abort.

Writer for partition [partitionId] is aborting.

In the end, compute prints out the following ERROR message to the logs:

Writer for partition [partitionId] aborted.