ClusteredDistribution¶
ClusteredDistribution is a Distribution that <
ClusteredDistribution requires that the <Nil).
ClusteredDistribution is <
-  
MapGroupsExec, HashAggregateExec, ObjectHashAggregateExec, SortAggregateExec, WindowExec -  
Spark Structured Streaming's
FlatMapGroupsWithStateExec,StateStoreRestoreExec,StateStoreSaveExec,StreamingDeduplicateExec,StreamingSymmetricHashJoinExec,StreamingSymmetricHashJoinExec -  
SparkR's
FlatMapGroupsInRExec -  
PySpark's
FlatMapGroupsInPandasExec 
ClusteredDistribution is used when:
-  
DataSourcePartitioning,SinglePartition,HashPartitioning, andRangePartitioningare requested tosatisfies -  
EnsureRequirements is executed for Adaptive Query Execution
 
=== [[createPartitioning]] createPartitioning Method
[source, scala]¶
createPartitioning(numPartitions: Int): Partitioning¶
createPartitioning creates a HashPartitioning for the <numPartitions.
createPartitioning reports an AssertionError when the <numPartitions.
[options="wrap"]
This ClusteredDistribution requires [requiredNumPartitions] partitions, but the actual number of partitions is [numPartitions].
createPartitioning is part of the Distribution abstraction.
Creating Instance¶
ClusteredDistribution takes the following to be created:
- [[clustering]] Clustering expressions
 - [[requiredNumPartitions]] Required number of partitions (default: 
None) 
Note
None for the required number of partitions indicates to use any number of partitions (possibly spark.sql.shuffle.partitions configuration property).