NarrowDependency¶
NarrowDependency[T]
is an extension of the Dependency abstraction for narrow dependencies (of RDD[T]s) where each partition of the child RDD depends on a small number of partitions of the parent RDD.
Contract¶
getParents¶
getParents(
partitionId: Int): Seq[Int]
The parent partitions for a given child partition
Used when:
DAGScheduler
is requested for the preferred locations (of a partition of anRDD
)
Implementations¶
OneToOneDependency¶
OneToOneDependency
is a NarrowDependency
with getParents returning a single-element collection with the given partitionId
.
val myRdd = sc.parallelize(0 to 9).map((_, 1))
scala> :type myRdd
org.apache.spark.rdd.RDD[(Int, Int)]
scala> myRdd.dependencies.foreach(println)
org.apache.spark.OneToOneDependency@801fe56
import org.apache.spark.OneToOneDependency
val dep = myRdd.dependencies.head.asInstanceOf[OneToOneDependency[(_, _)]]
scala> println(dep.getParents(0))
List(0)
scala> println(dep.getParents(1))
List(1)
PruneDependency¶
PruneDependency
is a NarrowDependency
that represents a dependency between the PartitionPruningRDD
and the parent RDD (with a subset of partitions of the parents).
RangeDependency¶
RangeDependency
is a NarrowDependency
that represents a one-to-one dependency between ranges of partitions in the parent and child RDDs.
Used in UnionRDD
(SparkContext.union
).
val r1 = sc.range(0, 4)
val r2 = sc.range(5, 9)
val unioned = sc.union(r1, r2)
scala> unioned.dependencies.foreach(println)
org.apache.spark.RangeDependency@76b0e1d9
org.apache.spark.RangeDependency@3f3e51e0
import org.apache.spark.RangeDependency
val dep = unioned.dependencies.head.asInstanceOf[RangeDependency[(_, _)]]
scala> println(dep.getParents(0))
List(0)
Creating Instance¶
NarrowDependency
takes the following to be created:
Abstract Class
NarrowDependency
is an abstract class and cannot be created directly. It is created indirectly for the concrete NarrowDependencies.