HashClusteredDistribution¶
HashClusteredDistribution
is a Distribution.md[Distribution] that <
[[requiredNumPartitions]] HashClusteredDistribution
specifies None
for the Distribution.md#requiredNumPartitions[required number of partitions].
Note
None
for the required number of partitions indicates to use any number of partitions (possibly spark.sql.shuffle.partitions configuration property).
HashClusteredDistribution
is <CoGroupExec
, ShuffledHashJoinExec.md[ShuffledHashJoinExec], SortMergeJoinExec.md[SortMergeJoinExec] and Spark Structured Streaming's StreamingSymmetricHashJoinExec
).
[[creating-instance]][[expressions]] HashClusteredDistribution
takes hash expressions/Expression.md[expressions] when created.
HashClusteredDistribution
requires that the <Nil
).
HashClusteredDistribution
is used when:
-
EnsureRequirements is executed (for Adaptive Query Execution)
-
HashPartitioning
is requested tosatisfies
=== [[createPartitioning]] createPartitioning
Method
[source, scala]¶
createPartitioning( numPartitions: Int): Partitioning
createPartitioning
creates a HashPartitioning
for the <numPartitions
.
createPartitioning
is part of the Distribution abstraction.