MultiDimClustering¶
MultiDimClustering
is an abstraction of multi-dimensional clustering algorithms (for changing the data layout).
Contract¶
cluster¶
cluster(
df: DataFrame,
colNames: Seq[String],
approxNumPartitions: Int): DataFrame
Note
It can be surprising to find out that this method is never really used. The reason is that there is a companion object MultiDimClustering
with the cluster utility of the same signature that simply ZOrderClustering.cluster (bypassing any virtual calls as if there were multiple implementations yet there is just one).
Implementations¶
cluster¶
cluster(
df: DataFrame,
approxNumPartitions: Int,
colNames: Seq[String]): DataFrame
cluster
does Z-Order clustering.
cluster
is used when:
OptimizeExecutor
is requested to runOptimizeBinJob (with isMultiDimClustering flag enabled)
AssertionError¶
cluster
throws an AssertionError
when there are no colNames
specified:
Cannot cluster by zero columns!