# HilbertClustering¶

`HilbertClustering` is a SpaceFillingCurveClustering for multi-dimensional clustering with hilbert curve.

`HilbertClustering` requires between 2 and up to 9 columns to cluster by.

Singleton Object

`HilbertClustering` is a Scala object which is a class that has exactly one instance. It is created lazily when it is referenced, like a `lazy val`.

## Clustering Expression¶

SpaceFillingCurveClustering
``````getClusteringExpression(
cols: Seq[Column],
numRanges: Int): Column
``````

`getClusteringExpression` is part of the SpaceFillingCurveClustering abstraction.

`getClusteringExpression` creates a `rangeIdCols` as range_partition_id for the given `cols` columns and the `numRanges` number of partitions (buckets).

In the end, `getClusteringExpression` hilbert_index with the following:

• The number of bits being one more than the number of trailing zeros of the int value with at most a single one-bit, in the position of the highest-order ("leftmost") one-bit in the `numRanges` value

Number of Bits Explained

Given `numRanges` is `5`, the position of the highest-order ("leftmost") one-bit is `2`.

``````val numRanges = 5
scala> println(s"\$numRanges in the two's complement binary representation is \${Integer.toBinaryString(numRanges)}")
5 in the two's complement binary representation is 101
``````

Counting positions from left to right, starting from `0`, gives `2` as the position of the highest-order ("leftmost") one-bit.

``````scala> print(s"For \${numRanges}, the int value with at most a single one-bit is \${Integer.highestOneBit(numRanges)}")
For 5, the int value with at most a single one-bit is 4
``````

The int value with at most a single one-bit in the position of the highest-order ("leftmost") one-bit being `2` is `4` (`2^2`).

The number of zero bits following the lowest-order ("rightmost") one-bit in the two's complement binary representation of the int value (`4`) is `2`.

``````scala> println(s"For \${Integer.highestOneBit(numRanges)}, the number of zero bits is \${Integer.numberOfTrailingZeros(Integer.highestOneBit(numRanges))}")
For 4, the number of zero bits is 2
``````

In the end, `getClusteringExpression` uses `3` as the number of bits.

• The range_partition_id columns