Skip to content

StateStoreOps

StateStoreOps is a Scala implicit class of a data RDD (of type RDD[T]) to create a StateStoreRDD for the following physical operators:

Note

Implicit Classes are a language feature in Scala for implicit conversions with extension methods for existing types.

Creating StateStoreRDD (with storeUpdateFunction Aborting StateStore When Task Fails)

mapPartitionsWithStateStore[U: ClassTag](
  sqlContext: SQLContext,
  stateInfo: StatefulOperatorStateInfo,
  keySchema: StructType,
  valueSchema: StructType,
  numColsPrefixKey: Int)(
  storeUpdateFunction: (StateStore, Iterator[T]) => Iterator[U]): StateStoreRDD[T, U]
mapPartitionsWithStateStore[U: ClassTag](
  stateInfo: StatefulOperatorStateInfo,
  keySchema: StructType,
  valueSchema: StructType,
  numColsPrefixKey: Int,
  sessionState: SessionState,
  storeCoordinator: Option[StateStoreCoordinatorRef],
  extraOptions: Map[String, String] = Map.empty)(
  storeUpdateFunction: (StateStore, Iterator[T]) => Iterator[U]): StateStoreRDD[T, U]
numColsPrefixKey
Physical Operator numColsPrefixKey
FlatMapGroupsWithStateExec 0
SessionWindowStateStoreSaveExec numColsForPrefixKey
StateStoreSaveExec 0
StreamingDeduplicateExec 0
StreamingGlobalLimitExec 0

mapPartitionsWithStateStore creates a (wrapper) function to abort the StateStore if state updates had not been committed before a task finished (which is to make sure that the StateStore has been committed or aborted in the end to follow the contract of StateStore).

Note

mapPartitionsWithStateStore uses TaskCompletionListener (Spark Core) to be notified when a task has finished.

In the end, mapPartitionsWithStateStore creates a StateStoreRDD (with the wrapper function, SessionState and StateStoreCoordinatorRef).


mapPartitionsWithStateStore is used when the following physical operators are executed: