Skip to content

StateStoreAwareZipPartitionsRDD

StateStoreAwareZipPartitionsRDD is a ZippedPartitionsBaseRDD (Spark Core) with the rdd1 and rdd2 parent RDDs.

class StateStoreAwareZipPartitionsRDD[A: ClassTag, B: ClassTag, V: ClassTag]
extends ZippedPartitionsBaseRDD[V]

StateStoreAwareZipPartitionsRDD is used to execute the following physical operators (using StateStoreAwareZipPartitionsHelper implicit class):

Creating Instance

StateStoreAwareZipPartitionsRDD takes the following to be created:

StateStoreAwareZipPartitionsRDD is created when:

Process Partitions Function

f: (Int, Iterator[A], Iterator[B]) => Iterator[V]

StateStoreAwareZipPartitionsRDD is given a function when created that is used to process (join) rows of two partitions of the left and right RDDs.

Physical Operator Function
FlatMapGroupsWithStateExec processDataWithPartition
StreamingSymmetricHashJoinExec processPartitions

Computing Partition

compute(
  s: Partition,
  context: TaskContext): Iterator[V]

compute is part of the RDD (Spark Core) abstraction.


compute executes the process partitions function with the following:

  • Partition ID
  • Partition records from the rdd1 for the given partition ID
  • Partition records from the rdd2 for the given partition ID