Skip to content

MapPartitionsRDD

MapPartitionsRDD is an RDD that has exactly one-to-one narrow dependency on the <> and "describes" a distributed computation of the given <> to every RDD partition.

MapPartitionsRDD is <> when:

  • PairRDDFunctions (RDD[(K, V)]) is requested to rdd:PairRDDFunctions.md#mapValues[mapValues] and rdd:PairRDDFunctions.md#flatMapValues[flatMapValues] (with the <> flag enabled)

  • RDD[T] is requested to <>, <>, <>, <>, <>, <>, <>, and <>

  • RDDBarrier[T] is requested to <> (with the <> flag enabled)

By default, it does not preserve partitioning -- the last input parameter preservesPartitioning is false. If it is true, it retains the original RDD's partitioning.

MapPartitionsRDD is the result of the following transformations:

  • filter
  • glom
  • spark-rdd-transformations.md#mapPartitions[mapPartitions]
  • mapPartitionsWithIndex
  • rdd:PairRDDFunctions.md#mapValues[PairRDDFunctions.mapValues]
  • rdd:PairRDDFunctions.md#flatMapValues[PairRDDFunctions.flatMapValues]

[[isBarrier_]] When requested for the rdd:RDD.md#isBarrier_[isBarrier_] flag, MapPartitionsRDD gives the <> flag or check whether any of the RDDs of the rdd:RDD.md#dependencies[RDD dependencies] are rdd:RDD.md#isBarrier[barrier-enabled].

=== [[creating-instance]] Creating MapPartitionsRDD Instance

MapPartitionsRDD takes the following to be created:

  • [[prev]] Parent rdd:RDD.md[RDD] (RDD[T])
  • [[f]] Function to execute on partitions +
    (TaskContext, partitionID, Iterator[T]) => Iterator[U]
    
  • [[preservesPartitioning]] preservesPartitioning flag (default: false)
  • [[isFromBarrier]] isFromBarrier flag for <> (default: false)
  • [[isOrderSensitive]] isOrderSensitive flag (default: false)

Last update: 2020-10-09