StateStoreAwareZipPartitionsRDD¶
StateStoreAwareZipPartitionsRDD is a ZippedPartitionsBaseRDD (Spark Core) with the rdd1 and rdd2 parent RDDs.
class StateStoreAwareZipPartitionsRDD[A: ClassTag, B: ClassTag, V: ClassTag]
extends ZippedPartitionsBaseRDD[V]
StateStoreAwareZipPartitionsRDD is used to execute the following physical operators (using StateStoreAwareZipPartitionsHelper implicit class):
Creating Instance¶
StateStoreAwareZipPartitionsRDD takes the following to be created:
-
SparkContext(Spark Core) - Process Partitions Function
- Left
RDD[A] - Right
RDD[B] - StatefulOperatorStateInfo
- Names of StateStores
- StateStoreCoordinatorRef
StateStoreAwareZipPartitionsRDD is created when:
StateStoreAwareZipPartitionsHelperis requested to stateStoreAwareZipPartitions
Process Partitions Function¶
f: (Int, Iterator[A], Iterator[B]) => Iterator[V]
StateStoreAwareZipPartitionsRDD is given a function when created that is used to process (join) rows of two partitions of the left and right RDDs.
| Physical Operator | Function |
|---|---|
| FlatMapGroupsWithStateExec | processDataWithPartition |
| StreamingSymmetricHashJoinExec | processPartitions |
Computing Partition¶
compute(
s: Partition,
context: TaskContext): Iterator[V]
compute is part of the RDD (Spark Core) abstraction.
compute executes the process partitions function with the following: