NewHadoopRDD is an RDD of K keys and V values.

  • SparkContext.newAPIHadoopFile

  • SparkContext.newAPIHadoopRDD

  • (indirectly) SparkContext.binaryFiles

  • (indirectly) SparkContext.wholeTextFiles

NewHadoopRDD is the base RDD of BinaryFileRDD and WholeTextFileRDD.

getPreferredLocations Method


Creating NewHadoopRDD Instance

NewHadoopRDD takes the following when created:

  • SparkContext

  • HDFS' InputFormat[K, V]

  • K class name

  • V class name

  • transient HDFS' Configuration

NewHadoopRDD initializes the internal registries and counters.