Skip to content

TaskLocation

TaskLocation represents a placement preference of an RDD partition, i.e. a hint of the location to submit scheduler:Task.md[tasks] for execution.

TaskLocations are tracked by scheduler:DAGScheduler.md#cacheLocs[DAGScheduler] for scheduler:DAGScheduler.md#submitMissingTasks[submitting missing tasks of a stage].

TaskLocation is available as scheduler:Task.md#preferredLocations[preferredLocations] of a task.

[[host]] Every TaskLocation describes the location by host name, but could also use other location-related metadata.

TaskLocations of an RDD and a partition is available using ROOT:SparkContext.md#getPreferredLocs[SparkContext.getPreferredLocs] method.

Sealed

TaskLocation is a Scala private[spark] sealed trait so all the available implementations of TaskLocation trait are in a single Scala file.

== [[ExecutorCacheTaskLocation]] ExecutorCacheTaskLocation

ExecutorCacheTaskLocation describes a <> and an executor.

ExecutorCacheTaskLocation informs the Scheduler to prefer a given executor, but the next level of preference is any executor on the same host if this is not possible.

== [[HDFSCacheTaskLocation]] HDFSCacheTaskLocation

HDFSCacheTaskLocation describes a <> that is cached by HDFS.

Used exclusively when rdd:spark-rdd-HadoopRDD.md#getPreferredLocations[HadoopRDD] and rdd:spark-rdd-NewHadoopRDD.md#getPreferredLocations[NewHadoopRDD] are requested for their placement preferences (aka preferred locations).

== [[HostTaskLocation]] HostTaskLocation

HostTaskLocation describes a <> only.


Last update: 2020-10-08