ZOrderClustering¶
ZOrderClustering is a SpaceFillingCurveClustering for multi-dimensional clustering with zorder curve.
Clustering Expression¶
SpaceFillingCurveClustering
getClusteringExpression is part of the SpaceFillingCurveClustering abstraction.
getClusteringExpression creates a range_partition_id function (with the given numRanges for the number of partitions) for every Column (in the given cols).
In the end, getClusteringExpression interleave_bits with the range_partition_id columns and casts the (evaluation) result to StringType.
Demo¶
For some reason, getClusteringExpression is protected[skipping] so let's hop over the fence with the following hack.
Paste the following to spark-shell in :paste -raw mode:
package org.apache.spark.sql.delta.skipping
object protectedHack {
import org.apache.spark.sql.Column
def getClusteringExpression(
cols: Seq[Column], numRanges: Int): Column = {
ZOrderClustering.getClusteringExpression(cols, numRanges)
}
}