StatefulOpClusteredDistribution¶
StatefulOpClusteredDistribution is a Distribution (Spark SQL).
StatefulOpClusteredDistribution requires the Expressions are specified or throws an exception:
The expressions for hash of a StatefulOpClusteredDistribution should not be Nil.
An AllTuples should be used to represent a distribution that only has a single partition.
Creating Instance¶
StatefulOpClusteredDistribution takes the following to be created:
-
Expressions (Spark SQL) - Required number of partitions
StatefulOpClusteredDistribution is created when:
StatefulOperatorPartitioningis requested to getCompatibleDistributionStreamingSymmetricHashJoinExecis requested for the required child output distribution
Required Number of Partitions¶
StatefulOpClusteredDistribution is given a required number of partitions when created.
requiredNumPartitions is part of the Distribution (Spark SQL) abstraction.
Partitioning¶
createPartitioning(
numPartitions: Int): Partitioning
createPartitioning is part of the Distribution (Spark SQL) abstraction.
createPartitioning asserts that the given numPartitions is exactly the required number of partitions or throws an exception otherwise:
This StatefulOpClusteredDistribution requires [requiredNumPartitions] partitions,
but the actual number of partitions is [numPartitions].
createPartitioning creates a HashPartitioning (Spark SQL) (with the expressions and the numPartitions).