FlatMapGroupsWithState Logical Operator¶
FlatMapGroupsWithState is a binary logical operator that represents the following KeyValueGroupedDataset high-level operators:
BinaryNode¶
FlatMapGroupsWithState is a binary logical operator with two child operators:
- The child as the left operator
- The initialState as the right operator
ObjectProducer¶
FlatMapGroupsWithState is an ObjectProducer.
Creating Instance¶
FlatMapGroupsWithState takes the following to be created:
- State Mapping Function (
(Any, Iterator[Any], LogicalGroupState[Any]) => Iterator[Any]) - Key Deserializer Expression
- Value Deserializer Expression
- Grouping Attributes
- Data Attributes
- Output Object Attribute
- State ExpressionEncoder
-
OutputMode -
isMapGroupsWithStateflag (default:false) -
GroupStateTimeout -
hasInitialStateflag (default:false) - Initial State Group Attributes
- Initial State Data Attributes
- Initial State Deserializer Expression
- Initial State LogicalPlan
- Child LogicalPlan
FlatMapGroupsWithState is created using apply factory.
Creating FlatMapGroupsWithState¶
apply[K: Encoder, V: Encoder, S: Encoder, U: Encoder](
func: (Any, Iterator[Any], LogicalGroupState[Any]) => Iterator[Any],
groupingAttributes: Seq[Attribute],
dataAttributes: Seq[Attribute],
outputMode: OutputMode,
isMapGroupsWithState: Boolean,
timeout: GroupStateTimeout,
child: LogicalPlan): LogicalPlan
apply[K: Encoder, V: Encoder, S: Encoder, U: Encoder](
func: (Any, Iterator[Any], LogicalGroupState[Any]) => Iterator[Any],
groupingAttributes: Seq[Attribute],
dataAttributes: Seq[Attribute],
outputMode: OutputMode,
isMapGroupsWithState: Boolean,
timeout: GroupStateTimeout,
child: LogicalPlan,
initialStateGroupAttrs: Seq[Attribute],
initialStateDataAttrs: Seq[Attribute],
initialState: LogicalPlan): LogicalPlan
apply creates a FlatMapGroupsWithState with the following:
UnresolvedDeserializers for the keys, values and the initial state- Generates the output object attribute
In the end, apply creates a SerializeFromObject unary logical operator with the FlatMapGroupsWithState operator as the child.
apply is used for the following high-level operators:
Execution Planning¶
FlatMapGroupsWithState is planed as follows:
| Physical Operator | Execution Planning Strategy |
|---|---|
FlatMapGroupsWithStateExec (Spark Structured Streaming) | FlatMapGroupsWithStateStrategy (Spark Structured Streaming) |
CoGroupExec or MapGroupsExec | BasicOperators |