ClusteredDistribution¶
ClusteredDistribution is a Distribution that <
ClusteredDistribution requires that the <Nil).
ClusteredDistribution is <
-
MapGroupsExec, HashAggregateExec, ObjectHashAggregateExec, SortAggregateExec, WindowExec -
Spark Structured Streaming's
FlatMapGroupsWithStateExec,StateStoreRestoreExec,StateStoreSaveExec,StreamingDeduplicateExec,StreamingSymmetricHashJoinExec,StreamingSymmetricHashJoinExec -
SparkR's
FlatMapGroupsInRExec -
PySpark's
FlatMapGroupsInPandasExec
ClusteredDistribution is used when:
-
DataSourcePartitioning,SinglePartition,HashPartitioning, andRangePartitioningare requested tosatisfies -
EnsureRequirements is executed for Adaptive Query Execution
=== [[createPartitioning]] createPartitioning Method
[source, scala]¶
createPartitioning(numPartitions: Int): Partitioning¶
createPartitioning creates a HashPartitioning for the <numPartitions.
createPartitioning reports an AssertionError when the <numPartitions.
[options="wrap"]
This ClusteredDistribution requires [requiredNumPartitions] partitions, but the actual number of partitions is [numPartitions].
createPartitioning is part of the Distribution abstraction.
Creating Instance¶
ClusteredDistribution takes the following to be created:
- [[clustering]] Clustering expressions
- [[requiredNumPartitions]] Required number of partitions (default:
None)
Note
None for the required number of partitions indicates to use any number of partitions (possibly spark.sql.shuffle.partitions configuration property).