StateStoreOps¶
StateStoreOps is a Scala implicit class of a data RDD (of type RDD[T]) to create a StateStoreRDD for the following physical operators:
- FlatMapGroupsWithStateExec
- SessionWindowStateStoreSaveExec
- StateStoreRestoreExec
- StateStoreSaveExec
- StreamingDeduplicateExec
- StreamingGlobalLimitExec
Note
Implicit Classes are a language feature in Scala for implicit conversions with extension methods for existing types.
Creating StateStoreRDD (with storeUpdateFunction Aborting StateStore When Task Fails)¶
mapPartitionsWithStateStore[U: ClassTag](
sqlContext: SQLContext,
stateInfo: StatefulOperatorStateInfo,
keySchema: StructType,
valueSchema: StructType,
numColsPrefixKey: Int)(
storeUpdateFunction: (StateStore, Iterator[T]) => Iterator[U]): StateStoreRDD[T, U]
mapPartitionsWithStateStore[U: ClassTag](
stateInfo: StatefulOperatorStateInfo,
keySchema: StructType,
valueSchema: StructType,
numColsPrefixKey: Int,
sessionState: SessionState,
storeCoordinator: Option[StateStoreCoordinatorRef],
extraOptions: Map[String, String] = Map.empty)(
storeUpdateFunction: (StateStore, Iterator[T]) => Iterator[U]): StateStoreRDD[T, U]
numColsPrefixKey
| Physical Operator | numColsPrefixKey |
|---|---|
| FlatMapGroupsWithStateExec | 0 |
| SessionWindowStateStoreSaveExec | numColsForPrefixKey |
| StateStoreSaveExec | 0 |
| StreamingDeduplicateExec | 0 |
| StreamingGlobalLimitExec | 0 |
mapPartitionsWithStateStore creates a (wrapper) function to abort the StateStore if state updates had not been committed before a task finished (which is to make sure that the StateStore has been committed or aborted in the end to follow the contract of StateStore).
Note
mapPartitionsWithStateStore uses TaskCompletionListener (Spark Core) to be notified when a task has finished.
In the end, mapPartitionsWithStateStore creates a StateStoreRDD (with the wrapper function, SessionState and StateStoreCoordinatorRef).
mapPartitionsWithStateStore is used when the following physical operators are executed: