StateStoreOps¶
StateStoreOps
is a Scala implicit class of a data RDD (of type RDD[T]
) to create a StateStoreRDD for the following physical operators:
- FlatMapGroupsWithStateExec
- SessionWindowStateStoreSaveExec
- StateStoreRestoreExec
- StateStoreSaveExec
- StreamingDeduplicateExec
- StreamingGlobalLimitExec
Note
Implicit Classes are a language feature in Scala for implicit conversions with extension methods for existing types.
Creating StateStoreRDD (with storeUpdateFunction Aborting StateStore When Task Fails)¶
mapPartitionsWithStateStore[U: ClassTag](
sqlContext: SQLContext,
stateInfo: StatefulOperatorStateInfo,
keySchema: StructType,
valueSchema: StructType,
numColsPrefixKey: Int)(
storeUpdateFunction: (StateStore, Iterator[T]) => Iterator[U]): StateStoreRDD[T, U]
mapPartitionsWithStateStore[U: ClassTag](
stateInfo: StatefulOperatorStateInfo,
keySchema: StructType,
valueSchema: StructType,
numColsPrefixKey: Int,
sessionState: SessionState,
storeCoordinator: Option[StateStoreCoordinatorRef],
extraOptions: Map[String, String] = Map.empty)(
storeUpdateFunction: (StateStore, Iterator[T]) => Iterator[U]): StateStoreRDD[T, U]
numColsPrefixKey
Physical Operator | numColsPrefixKey |
---|---|
FlatMapGroupsWithStateExec | 0 |
SessionWindowStateStoreSaveExec | numColsForPrefixKey |
StateStoreSaveExec | 0 |
StreamingDeduplicateExec | 0 |
StreamingGlobalLimitExec | 0 |
mapPartitionsWithStateStore
creates a (wrapper) function to abort the StateStore
if state updates had not been committed before a task finished (which is to make sure that the StateStore
has been committed or aborted in the end to follow the contract of StateStore
).
Note
mapPartitionsWithStateStore
uses TaskCompletionListener
(Spark Core) to be notified when a task has finished.
In the end, mapPartitionsWithStateStore
creates a StateStoreRDD (with the wrapper function, SessionState
and StateStoreCoordinatorRef).
mapPartitionsWithStateStore
is used when the following physical operators are executed: