Skip to content


HashClusteredDistribution is a[Distribution] that <> for the <> and a requested number of partitions.

[[requiredNumPartitions]] HashClusteredDistribution specifies None for the[required number of partitions].


None for the required number of partitions indicates to use any number of partitions (possibly spark.sql.shuffle.partitions configuration property).

HashClusteredDistribution is <> when the following physical operators are requested for the[required partition requirements of the child operator(s)] (e.g. CoGroupExec,[ShuffledHashJoinExec],[SortMergeJoinExec] and Spark Structured Streaming's StreamingSymmetricHashJoinExec).

[[creating-instance]][[expressions]] HashClusteredDistribution takes hash expressions/[expressions] when created.

HashClusteredDistribution requires that the <> should not be empty (i.e. Nil).

HashClusteredDistribution is used when:

  • EnsureRequirements is executed (for Adaptive Query Execution)

  • HashPartitioning is requested to satisfies

=== [[createPartitioning]] createPartitioning Method

[source, scala]

createPartitioning( numPartitions: Int): Partitioning

createPartitioning creates a HashPartitioning for the <> and the input numPartitions.

createPartitioning is part of the Distribution abstraction.