flatMapGroupsWithState Operator¶
flatMapGroupsWithState is part of KeyValueGroupedDataset (Spark SQL) API for Arbitrary Stateful Streaming Aggregation with an explicit state logic.
flatMapGroupsWithState[S: Encoder, U: Encoder](
outputMode: OutputMode,
timeoutConf: GroupStateTimeout)(
func: (K, Iterator[V], GroupState[S]) => Iterator[U]): Dataset[U]
flatMapGroupsWithState[S: Encoder, U: Encoder](
outputMode: OutputMode,
timeoutConf: GroupStateTimeout,
initialState: KeyValueGroupedDataset[K, S])(
func: (K, Iterator[V], GroupState[S]) => Iterator[U]): Dataset[U] // (1)!
- Since 3.2.0
Input Arguments¶
flatMapGroupsWithState accepts the following:
- OutputMode
- GroupStateTimeout
- A function with the
Kkey and theVvalues and the current GroupState for the givenKkey - (optionally)
KeyValueGroupedDatasetfor an user-defined initial state
FlatMapGroupsWithState Logical Operator¶
flatMapGroupsWithState creates a Dataset with FlatMapGroupsWithState logical operator with the following:
LogicalGroupStategroupingAttributesof theKeyValueGroupedDatasetdataAttributesof theKeyValueGroupedDatasetisMapGroupsWithStateflag disabled (false)logicalPlanof theKeyValueGroupedDatasetas the child operator
Output Modes¶
flatMapGroupsWithState supports Append and Update output modes only and throws an IllegalArgumentException otherwise:
The output mode of function should be append or update
Note
An OutputMode is a required argument, but does not seem to be used at all. Check out the question What's the purpose of OutputMode in flatMapGroupsWithState? How/where is it used? on StackOverflow.