MultiDimClusteringFunctions¶
MultiDimClusteringFunctions utility offers Spark SQL functions for multi-dimensional clustering.
range_partition_id¶
range_partition_id(
col: Column,
numPartitions: Int): Column
range_partition_id creates a Column (Spark SQL) with RangePartitionId unary expression (for the given arguments).
range_partition_id is used when:
ZOrderClusteringutility is used for the clustering expression
interleave_bits¶
interleave_bits(
cols: Column*): Column
interleave_bits creates a Column (Spark SQL) with InterleaveBits expression (for the expressions of the given columns).
interleave_bits is used when:
ZOrderClusteringutility is used for the clustering expression
hilbert_index¶
hilbert_index(
numBits: Int,
cols: Column*): Column
hilbert_index creates a Column (Spark SQL) to execute one of the following Expressions (Spark SQL) based on the hilbertBits:
- HilbertLongIndex for up to 64 hilbert bits
- HilbertByteArrayIndex, otherwise
The hilbertBits is the number of columns (cols) multiplied by the number of bits (numBits).
SparkException: Hilbert indexing can only be used on 9 or fewer columns
hilbert_index throws a SparkException for 10 or more columns (cols).
Hilbert indexing can only be used on 9 or fewer columns.
hilbert_index is used when:
HilbertClusteringis requested for the clustering expression