Partitioner is an abstraction to define how the elements in a key-value pair RDD are partitioned by key. Partitioner maps keys to partition IDs (from 0 to numPartitions - 1).

Partitioner is used to ensure that records for a given key have to reside on a single partition.

Available Partitioners

Partitioner Description


Hash-based partitioning


numPartitions Method

numPartitions: Int

numPartitions is the number of partition to use for mapping keys to partition IDs.

numPartitions is used when…​FIXME

getPartition Method

getPartition(key: Any): Int

getPartition maps a given key to a partition ID (from 0 to numPartitions - 1)

getPartition is used when…​FIXME

defaultPartitioner Method

  rdd: RDD[_],
  others: RDD[_]*): Partitioner


defaultPartitioner is used when…​FIXME