Skip to content


MapPartitionsRDD is an RDD that has exactly one-to-one narrow dependency on the <> and "describes" a distributed computation of the given <> to every RDD partition.

MapPartitionsRDD is <> when:

  • PairRDDFunctions (RDD[(K, V)]) is requested to[mapValues] and[flatMapValues] (with the <> flag enabled)

  • RDD[T] is requested to <>, <>, <>, <>, <>, <>, <>, and <>

  • RDDBarrier[T] is requested to <> (with the <> flag enabled)

By default, it does not preserve partitioning -- the last input parameter preservesPartitioning is false. If it is true, it retains the original RDD's partitioning.

MapPartitionsRDD is the result of the following transformations:

  • filter
  • glom
  • mapPartitionsWithIndex

[[isBarrier_]] When requested for the[isBarrier_] flag, MapPartitionsRDD gives the <> flag or check whether any of the RDDs of the[RDD dependencies] are[barrier-enabled].

=== [[creating-instance]] Creating MapPartitionsRDD Instance

MapPartitionsRDD takes the following to be created:

  • [[prev]] Parent[RDD] (RDD[T])
  • [[f]] Function to execute on partitions +
    (TaskContext, partitionID, Iterator[T]) => Iterator[U]
  • [[preservesPartitioning]] preservesPartitioning flag (default: false)
  • [[isFromBarrier]] isFromBarrier flag for <> (default: false)
  • [[isOrderSensitive]] isOrderSensitive flag (default: false)