ClusteredDistribution¶
ClusteredDistribution
is a Distribution that <
ClusteredDistribution
requires that the <Nil
).
ClusteredDistribution
is <
-
MapGroupsExec
, HashAggregateExec, ObjectHashAggregateExec, SortAggregateExec, WindowExec -
Spark Structured Streaming's
FlatMapGroupsWithStateExec
,StateStoreRestoreExec
,StateStoreSaveExec
,StreamingDeduplicateExec
,StreamingSymmetricHashJoinExec
,StreamingSymmetricHashJoinExec
-
SparkR's
FlatMapGroupsInRExec
-
PySpark's
FlatMapGroupsInPandasExec
ClusteredDistribution
is used when:
-
DataSourcePartitioning
,SinglePartition
,HashPartitioning
, andRangePartitioning
are requested tosatisfies
-
EnsureRequirements is executed for Adaptive Query Execution
=== [[createPartitioning]] createPartitioning
Method
[source, scala]¶
createPartitioning(numPartitions: Int): Partitioning¶
createPartitioning
creates a HashPartitioning
for the <numPartitions
.
createPartitioning
reports an AssertionError
when the <numPartitions
.
[options="wrap"]
This ClusteredDistribution requires [requiredNumPartitions] partitions, but the actual number of partitions is [numPartitions].
createPartitioning
is part of the Distribution abstraction.
Creating Instance¶
ClusteredDistribution
takes the following to be created:
- [[clustering]] Clustering expressions
- [[requiredNumPartitions]] Required number of partitions (default:
None
)
Note
None
for the required number of partitions indicates to use any number of partitions (possibly spark.sql.shuffle.partitions configuration property).