StateStoreAwareZipPartitionsRDD¶
StateStoreAwareZipPartitionsRDD
is a ZippedPartitionsBaseRDD
(Spark Core) with the rdd1 and rdd2 parent RDD
s.
class StateStoreAwareZipPartitionsRDD[A: ClassTag, B: ClassTag, V: ClassTag]
extends ZippedPartitionsBaseRDD[V]
StateStoreAwareZipPartitionsRDD
is used to execute the following physical operators (using StateStoreAwareZipPartitionsHelper implicit class):
Creating Instance¶
StateStoreAwareZipPartitionsRDD
takes the following to be created:
-
SparkContext
(Spark Core) - Process Partitions Function
- Left
RDD[A]
- Right
RDD[B]
- StatefulOperatorStateInfo
- Names of StateStores
- StateStoreCoordinatorRef
StateStoreAwareZipPartitionsRDD
is created when:
StateStoreAwareZipPartitionsHelper
is requested to stateStoreAwareZipPartitions
Process Partitions Function¶
f: (Int, Iterator[A], Iterator[B]) => Iterator[V]
StateStoreAwareZipPartitionsRDD
is given a function when created that is used to process (join) rows of two partitions of the left and right RDDs.
Physical Operator | Function |
---|---|
FlatMapGroupsWithStateExec | processDataWithPartition |
StreamingSymmetricHashJoinExec | processPartitions |
Computing Partition¶
compute(
s: Partition,
context: TaskContext): Iterator[V]
compute
is part of the RDD
(Spark Core) abstraction.
compute
executes the process partitions function with the following: