TaskLocation

TaskLocation represents a placement preference of an RDD partition, i.e. a hint of the location to submit tasks for execution.

TaskLocations are tracked by DAGScheduler for submitting missing tasks of a stage.

TaskLocation is available as preferredLocations of a task.

Every TaskLocation describes the location by host name, but could also use other location-related metadata.

TaskLocations of an RDD and a partition is available using SparkContext.getPreferredLocs method.

TaskLocation is a Scala private[spark] sealed trait so all the available implementations of TaskLocation trait are in a single Scala file.

ExecutorCacheTaskLocation

ExecutorCacheTaskLocation describes a host and an executor.

ExecutorCacheTaskLocation informs the Scheduler to prefer a given executor, but the next level of preference is any executor on the same host if this is not possible.

HDFSCacheTaskLocation

HDFSCacheTaskLocation describes a host that is cached by HDFS.

Used exclusively when HadoopRDD and NewHadoopRDD are requested for their placement preferences (aka preferred locations).

HostTaskLocation

HostTaskLocation describes a host only.