MapPartitions Unary Logical Operator¶
MapPartitions is a unary logical operator to represent Dataset.mapPartitions operator in a logical query plan.
MapPartitions is an ObjectConsumer and an ObjectProducer.
Creating Instance¶
MapPartitions takes the following to be created:
- Map Partitions Function (
Iterator[Any] => Iterator[Any]) - Output Object Attribute
- Child Logical Plan
MapPartitions is created (indirectly using apply utility) when:
- Dataset.mapPartitions operator is used
Query Planning¶
MapPartitions is planned as MapPartitionsExec physical operator when BasicOperators execution planning strategy is executed.
Creating MapPartitions¶
apply[T : Encoder, U : Encoder](
func: Iterator[T] => Iterator[U],
child: LogicalPlan): LogicalPlan
apply creates a MapPartitions unary logical operator with a DeserializeToObject child (for the type T of the input objects) and a SerializeFromObject parent (for the type U of the output objects).
apply is used when:
- Dataset.mapPartitions operator is used
Demo¶
val ds = spark.range(5)
val fn: Iterator[Long] => Iterator[String] = { ns =>
ns.map { n =>
if (n % 2 == 0) {
s"even ($n)"
} else {
s"odd ($n)"
}
}
}
ds.mapPartitions(fn) // FIXME That does not seem to work!