MultiDimClusteringFunctions¶
MultiDimClusteringFunctions
utility offers Spark SQL functions for multi-dimensional clustering.
range_partition_id¶
range_partition_id(
col: Column,
numPartitions: Int): Column
range_partition_id
creates a Column
(Spark SQL) with RangePartitionId unary expression (for the given arguments).
range_partition_id
is used when:
ZOrderClustering
utility is used for the clustering expression
interleave_bits¶
interleave_bits(
cols: Column*): Column
interleave_bits
creates a Column
(Spark SQL) with InterleaveBits expression (for the expressions of the given columns).
interleave_bits
is used when:
ZOrderClustering
utility is used for the clustering expression
hilbert_index¶
hilbert_index(
numBits: Int,
cols: Column*): Column
hilbert_index
creates a Column
(Spark SQL) to execute one of the following Expression
s (Spark SQL) based on the hilbertBits:
- HilbertLongIndex for up to 64 hilbert bits
- HilbertByteArrayIndex, otherwise
The hilbertBits is the number of columns (cols
) multiplied by the number of bits (numBits
).
SparkException: Hilbert indexing can only be used on 9 or fewer columns
hilbert_index
throws a SparkException
for 10 or more columns (cols
).
Hilbert indexing can only be used on 9 or fewer columns.
hilbert_index
is used when:
HilbertClustering
is requested for the clustering expression