Skip to content


TaskLocation represents a placement preference of an RDD partition, i.e. a hint of the location to submit[tasks] for execution.

TaskLocations are tracked by[DAGScheduler] for[submitting missing tasks of a stage].

TaskLocation is available as[preferredLocations] of a task.

[[host]] Every TaskLocation describes the location by host name, but could also use other location-related metadata.

TaskLocations of an RDD and a partition is available using[SparkContext.getPreferredLocs] method.


TaskLocation is a Scala private[spark] sealed trait so all the available implementations of TaskLocation trait are in a single Scala file.

== [[ExecutorCacheTaskLocation]] ExecutorCacheTaskLocation

ExecutorCacheTaskLocation describes a <> and an executor.

ExecutorCacheTaskLocation informs the Scheduler to prefer a given executor, but the next level of preference is any executor on the same host if this is not possible.

== [[HDFSCacheTaskLocation]] HDFSCacheTaskLocation

HDFSCacheTaskLocation describes a <> that is cached by HDFS.

Used exclusively when[HadoopRDD] and[NewHadoopRDD] are requested for their placement preferences (aka preferred locations).

== [[HostTaskLocation]] HostTaskLocation

HostTaskLocation describes a <> only.