FlatMapGroupsWithState Logical Operator¶
FlatMapGroupsWithState
is a binary logical operator (Spark SQL) that represents the following high-level operators in (a logical query plan of) a structured query:
Creating Instance¶
FlatMapGroupsWithState
takes the following to be created:
- State function (
(Any, Iterator[Any], LogicalGroupState[Any]) => Iterator[Any]
) - Catalyst Expression for keys
- Catalyst Expression for values
- Grouping Attributes
- Data Attributes
- Output Object Attribute
- State
ExpressionEncoder
- OutputMode
-
isMapGroupsWithState
flag (default:false
) - GroupStateTimeout
-
hasInitialState
flag (default:false
) - Initial State Group Attributes
- Initial State Data Attributes
- Initial State Deserializer
-
LogicalPlan
of the initial state - Child logical operator
Creating FlatMapGroupsWithState (under SerializeFromObject)¶
apply[K: Encoder, V: Encoder, S: Encoder, U: Encoder](
func: (Any, Iterator[Any], LogicalGroupState[Any]) => Iterator[Any],
groupingAttributes: Seq[Attribute],
dataAttributes: Seq[Attribute],
outputMode: OutputMode,
isMapGroupsWithState: Boolean,
timeout: GroupStateTimeout,
child: LogicalPlan): LogicalPlan
apply[K: Encoder, V: Encoder, S: Encoder, U: Encoder](
func: (Any, Iterator[Any], LogicalGroupState[Any]) => Iterator[Any],
groupingAttributes: Seq[Attribute],
dataAttributes: Seq[Attribute],
outputMode: OutputMode,
isMapGroupsWithState: Boolean,
timeout: GroupStateTimeout,
child: LogicalPlan,
initialStateGroupAttrs: Seq[Attribute],
initialStateDataAttrs: Seq[Attribute],
initialState: LogicalPlan): LogicalPlan
apply
creates a SerializeFromObject
logical operator with a FlatMapGroupsWithState
as its child logical operator.
Internally, apply
creates SerializeFromObject
object consumer (aka unary logical operator) with FlatMapGroupsWithState
logical plan.
Internally, apply
finds ExpressionEncoder
for the type S
and creates a FlatMapGroupsWithState
with UnresolvedDeserializer
for the types K
and V
.
In the end, apply
creates a SerializeFromObject
object consumer with the FlatMapGroupsWithState
.
Execution Planning¶
FlatMapGroupsWithState
is resolved (planned) to:
-
FlatMapGroupsWithStateExec physical operator for streaming queries (by FlatMapGroupsWithStateStrategy execution planning strategy)
-
CoGroupExec
orMapGroupsExec
physical operators for batch queries (byBasicOperators
execution planning strategy)
ObjectProducer¶
FlatMapGroupsWithState
is an ObjectProducer
(Spark SQL).