Skip to content

HashClusteredDistribution

HashClusteredDistribution is a Distribution.md[Distribution] that <> for the <> and a requested number of partitions.

[[requiredNumPartitions]] HashClusteredDistribution specifies None for the Distribution.md#requiredNumPartitions[required number of partitions].

Note

None for the required number of partitions indicates to use any number of partitions (possibly spark.sql.shuffle.partitions configuration property).

HashClusteredDistribution is <> when the following physical operators are requested for the SparkPlan.md#requiredChildDistribution[required partition requirements of the child operator(s)] (e.g. CoGroupExec, ShuffledHashJoinExec.md[ShuffledHashJoinExec], SortMergeJoinExec.md[SortMergeJoinExec] and Spark Structured Streaming's StreamingSymmetricHashJoinExec).

[[creating-instance]][[expressions]] HashClusteredDistribution takes hash expressions/Expression.md[expressions] when created.

HashClusteredDistribution requires that the <> should not be empty (i.e. Nil).

HashClusteredDistribution is used when:

  • EnsureRequirements is executed (for Adaptive Query Execution)

  • HashPartitioning is requested to satisfies

=== [[createPartitioning]] createPartitioning Method

[source, scala]

createPartitioning( numPartitions: Int): Partitioning


createPartitioning creates a HashPartitioning for the <> and the input numPartitions.

createPartitioning is part of the Distribution abstraction.