StateStoreOps¶
[[dataRDD]] StateStoreOps
is a Scala implicit class of a data RDD (of type RDD[T]
) to create a StateStoreRDD for the following physical operators:
Note
Implicit Classes are a language feature in Scala for implicit conversions with extension methods for existing types.
Creating StateStoreRDD (with storeUpdateFunction Aborting StateStore When Task Fails)¶
mapPartitionsWithStateStore[U](
stateInfo: StatefulOperatorStateInfo,
keySchema: StructType,
valueSchema: StructType,
indexOrdinal: Option[Int],
sessionState: SessionState,
storeCoordinator: Option[StateStoreCoordinatorRef])(
storeUpdateFunction: (StateStore, Iterator[T]) => Iterator[U]): StateStoreRDD[T, U]
Internally, mapPartitionsWithStateStore
requests SparkContext
to clean storeUpdateFunction
function.
NOTE: mapPartitionsWithStateStore
uses the <SparkContext
.
NOTE: Function Cleaning is to clean a closure from unreferenced variables before it is serialized and sent to tasks. SparkContext
reports a SparkException
when the closure is not serializable.
mapPartitionsWithStateStore
then creates a (wrapper) function to abort the StateStore
if state updates had not been committed before a task finished (which is to make sure that the StateStore
has been committed or aborted in the end to follow the contract of StateStore
).
NOTE: mapPartitionsWithStateStore
uses TaskCompletionListener
to be notified when a task has finished.
In the end, mapPartitionsWithStateStore
creates a StateStoreRDD (with the wrapper function, SessionState
and StateStoreCoordinatorRef).
mapPartitionsWithStateStore
is used when the following physical operators are executed: