Skip to content

MapPartitions Unary Logical Operator

MapPartitions is a unary logical operator to represent Dataset.mapPartitions operator in a logical query plan.

MapPartitions is an ObjectConsumer and an ObjectProducer.

Creating Instance

MapPartitions takes the following to be created:

MapPartitions is created (indirectly using apply utility) when:

Query Planning

MapPartitions is planned as MapPartitionsExec physical operator when BasicOperators execution planning strategy is executed.

Creating MapPartitions

apply[T : Encoder, U : Encoder](
  func: Iterator[T] => Iterator[U],
  child: LogicalPlan): LogicalPlan

apply creates a MapPartitions unary logical operator with a DeserializeToObject child (for the type T of the input objects) and a SerializeFromObject parent (for the type U of the output objects).

apply is used when:

Demo

val ds = spark.range(5)
val fn: Iterator[Long] => Iterator[String] = { ns =>
  ns.map { n =>
    if (n % 2 == 0) {
      s"even ($n)"
    } else {
      s"odd ($n)"
    }
  }
}
ds.mapPartitions(fn) // FIXME That does not seem to work!