MapPartitionsRDD is an RDD that has exactly one-to-one narrow dependency on the parent RDD and "describes" a distributed computation of the given function to every RDD partition.

MapPartitionsRDD is created when:

By default, it does not preserve partitioning — the last input parameter preservesPartitioning is false. If it is true, it retains the original RDD’s partitioning.

MapPartitionsRDD is the result of the following transformations:

When requested for the isBarrier_ flag, MapPartitionsRDD gives the isFromBarrier flag or check whether any of the RDDs of the RDD dependencies are barrier-enabled.

Creating MapPartitionsRDD Instance

MapPartitionsRDD takes the following to be created:

  • Parent RDD (RDD[T])

  • Function to execute on partitions

    (TaskContext, partitionID, Iterator[T]) => Iterator[U]
  • preservesPartitioning flag (default: false)

  • isFromBarrier flag for Barrier Execution Mode (default: false)

  • isOrderSensitive flag (default: false)