Skip to content

MultiDimClustering

MultiDimClustering is an abstraction of multi-dimensional clustering algorithms (for changing the data layout).

Contract

cluster

cluster(
  df: DataFrame,
  colNames: Seq[String],
  approxNumPartitions: Int): DataFrame

Note

It can be surprising to find out that this method is never really used. The reason is that there is a companion object MultiDimClustering with the cluster utility of the same signature that simply ZOrderClustering.cluster (bypassing any virtual calls as if there were multiple implementations yet there is just one).

Implementations

cluster

cluster(
  df: DataFrame,
  approxNumPartitions: Int,
  colNames: Seq[String]): DataFrame

cluster does Z-Order clustering.

cluster is used when:

AssertionError

cluster throws an AssertionError when there are no colNames specified:

Cannot cluster by zero columns!