MultiDimClusteringFunctions¶

MultiDimClusteringFunctions utility offers Spark SQL functions for multi-dimensional clustering.

range_partition_id¶

range_partition_id(
  col: Column,
  numPartitions: Int): Column

range_partition_id creates a Column (Spark SQL) with RangePartitionId unary expression (for the given arguments).

range_partition_id is used when:

interleave_bits(
  cols: Column*): Column

interleave_bits creates a Column (Spark SQL) with InterleaveBits expression (for the expressions of the given columns).

interleave_bits is used when:

hilbert_index(
  numBits: Int,
  cols: Column*): Column

hilbert_index creates a Column (Spark SQL) to execute one of the following Expressions (Spark SQL) based on the hilbertBits:

The hilbertBits is the number of columns (cols) multiplied by the number of bits (numBits).

SparkException: Hilbert indexing can only be used on 9 or fewer columns

hilbert_index throws a SparkException for 10 or more columns (cols).

Hilbert indexing can only be used on 9 or fewer columns.

hilbert_index is used when: