MapPartitions Unary Logical Operator¶
MapPartitions
is a unary logical operator to represent Dataset.mapPartitions operator in a logical query plan.
MapPartitions
is an ObjectConsumer
and an ObjectProducer
.
Creating Instance¶
MapPartitions
takes the following to be created:
- Map Partitions Function (
Iterator[Any] => Iterator[Any]
) - Output Object Attribute
- Child Logical Plan
MapPartitions
is created (indirectly using apply utility) when:
- Dataset.mapPartitions operator is used
Query Planning¶
MapPartitions
is planned as MapPartitionsExec
physical operator when BasicOperators execution planning strategy is executed.
Creating MapPartitions¶
apply[T : Encoder, U : Encoder](
func: Iterator[T] => Iterator[U],
child: LogicalPlan): LogicalPlan
apply
creates a MapPartitions unary logical operator with a DeserializeToObject child (for the type T
of the input objects) and a SerializeFromObject parent (for the type U
of the output objects).
apply
is used when:
- Dataset.mapPartitions operator is used
Demo¶
val ds = spark.range(5)
val fn: Iterator[Long] => Iterator[String] = { ns =>
ns.map { n =>
if (n % 2 == 0) {
s"even ($n)"
} else {
s"odd ($n)"
}
}
}
ds.mapPartitions(fn) // FIXME That does not seem to work!