Skip to content

NarrowDependency

NarrowDependency[T] is an extension of the Dependency abstraction for narrow dependencies (of RDD[T]s) where each partition of the child RDD depends on a small number of partitions of the parent RDD.

Contract

getParents

getParents(
  partitionId: Int): Seq[Int]

The parent partitions for a given child partition

Used when:

Implementations

OneToOneDependency

OneToOneDependency is a NarrowDependency with getParents returning a single-element collection with the given partitionId.

val myRdd = sc.parallelize(0 to 9).map((_, 1))

scala> :type myRdd
org.apache.spark.rdd.RDD[(Int, Int)]

scala> myRdd.dependencies.foreach(println)
org.apache.spark.OneToOneDependency@801fe56

import org.apache.spark.OneToOneDependency
val dep = myRdd.dependencies.head.asInstanceOf[OneToOneDependency[(_, _)]]

scala> println(dep.getParents(0))
List(0)

scala> println(dep.getParents(1))
List(1)

PruneDependency

PruneDependency is a NarrowDependency that represents a dependency between the PartitionPruningRDD and the parent RDD (with a subset of partitions of the parents).

RangeDependency

RangeDependency is a NarrowDependency that represents a one-to-one dependency between ranges of partitions in the parent and child RDDs.

Used in UnionRDD (SparkContext.union).

val r1 = sc.range(0, 4)
val r2 = sc.range(5, 9)

val unioned = sc.union(r1, r2)

scala> unioned.dependencies.foreach(println)
org.apache.spark.RangeDependency@76b0e1d9
org.apache.spark.RangeDependency@3f3e51e0

import org.apache.spark.RangeDependency
val dep = unioned.dependencies.head.asInstanceOf[RangeDependency[(_, _)]]

scala> println(dep.getParents(0))
List(0)

Creating Instance

NarrowDependency takes the following to be created:

Abstract Class

NarrowDependency is an abstract class and cannot be created directly. It is created indirectly for the concrete NarrowDependencies.