== [[ResultTask]] ResultTask -- Task to Compute Result for ResultStage
ResultTask is a scheduler:Task.md[Task] that <
DAGScheduler submits missing tasks for a
ResultTask is created with a <
[[internal-registries]] .ResultTask's Internal Registries and Counters [cols="1,2",options="header",width="100%"] |=== | Name | Description
preferredLocs | Collection of scheduler:TaskLocation.md[TaskLocations].
Corresponds directly to unique entries in <
locs is not defined, it is empty, and no task location preferences are defined.
Initialized when <
Used exclusively when
ResultTask is requested for <
=== [[creating-instance]] Creating ResultTask Instance
ResultTask takes the following when created:
stageId-- the stage the task is executed for
stageAttemptId-- the stage attempt id
- [[taskBinary]] ROOT:Broadcast.md with the serialized task (as
Array[Byte]). The broadcast contains of a serialized pair of
RDDand the function to execute.
- [[partition]] spark-rdd-Partition.md[Partition] to compute
- [[locs]] Collection of scheduler:TaskLocation.md[TaskLocations], i.e. preferred locations (executors) to execute the task on
- [[localProperties]] local
- [[serializedTaskMetrics]] The stage's serialized executor:TaskMetrics.md (as
- [[jobId]] (optional) spark-scheduler-ActiveJob.md[Job] id
- [[appId]] (optional) Application id
- [[appAttemptId]] (optional) Application attempt id
ResultTask initializes the <
preferredLocations is part of scheduler:Task.md#contract[Task contract].
preferredLocations simply returns <
=== [[runTask]] Deserialize RDD and Function (From Broadcast) and Execute Function (on RDD Partition) --
runTask(context: TaskContext): U¶
U is the type of a result as defined when <
runTask deserializes a RDD and a function from the <
runTask is part of scheduler:Task.md#contract[Task contract] to run a task.
runTask starts by tracking the time required to deserialize a RDD and a function to execute.
runTask serializer:Serializer.md#newInstance[creates a new closure
runTask uses core:SparkEnv.md#closureSerializer[
SparkEnv to access the current closure
runTask serializer:Serializer.md#deserialize[requests the closure
Serializer to deserialize an
RDD and the function to execute] (from <
runTask records scheduler:Task.md#_executorDeserializeTime[_executorDeserializeTime] and scheduler:Task.md#_executorDeserializeCpuTime[_executorDeserializeCpuTime] properties.
In the end,
runTask executes the function (passing in the input
context and the rdd:index.md#iterator[records from
partition of the RDD]).
partition to use to access the records in a deserialized RDD is defined when <