Skip to content

Murmur3Hash

Murmur3Hash is a HashExpression to calculate the hash code (integer) of the given child expressions.

Creating Instance

Murmur3Hash takes the following to be created:

Murmur3Hash is created when:

Demo

val data = Seq[Option[Int]](Some(0), None, None, None, Some(4), None)
  .toDF
  .withColumn("hash", hash('value))
scala> data.show
+-----+----------+
|value|      hash|
+-----+----------+
|    0| 933211791|
| null|        42|
| null|        42|
| null|        42|
|    4|-397064898|
| null|        42|
+-----+----------+
scala> data.printSchema
root
 |-- value: integer (nullable = true)
 |-- hash: integer (nullable = false)
val defaultSeed = 42

val nonEmptyPartitions = data
  .repartition(numPartitions = defaultSeed, partitionExprs = 'value)
  .mapPartitions { it: Iterator[org.apache.spark.sql.Row] =>
    import org.apache.spark.TaskContext
    val ns = it.map(_.get(0)).mkString(",")
    Iterator((TaskContext.getPartitionId, ns))
  }
  .as[(Long, String)]
  .collect
  .filterNot { case (pid, ns) => ns.isEmpty }

nonEmptyPartitions.foreach { case (pid, ns) => printf("%2s: %s%n", pid, ns) }
 0: null,null,null,null
25: 0
32: 4