Skip to content


ResultTask is a[Task] that <>.

<ResultTask is created>> exclusively when[DAGScheduler submits missing tasks for a ResultStage].

ResultTask is created with a <> with the RDD and the function to execute it on and the <>.

[[internal-registries]] .ResultTask's Internal Registries and Counters [cols="1,2",options="header",width="100%"] |=== | Name | Description

| [[preferredLocs]] preferredLocs | Collection of[TaskLocations].

Corresponds directly to unique entries in <> with the only rule that when locs is not defined, it is empty, and no task location preferences are defined.

Initialized when <ResultTask is created>>.

Used exclusively when ResultTask is requested for <>.


Creating Instance

ResultTask takes the following when created:

  • stageId -- the stage the task is executed for
  • stageAttemptId -- the stage attempt id
  • [[taskBinary]][] with the serialized task (as Array[Byte]). The broadcast contains of a serialized pair of RDD and the function to execute.
  • [[partition]] Partition to compute
  • [[locs]] Collection of[TaskLocations], i.e. preferred locations (executors) to execute the task on
  • [[outputId]] outputId
  • [[localProperties]] local Properties
  • [[serializedTaskMetrics]] The stage's serialized[] (as Array[Byte])
  • [[jobId]] (optional)[Job] id
  • [[appId]] (optional) Application id
  • [[appAttemptId]] (optional) Application attempt id

ResultTask initializes the <>.

=== [[preferredLocations]] preferredLocations Method

[source, scala]

preferredLocations: Seq[TaskLocation]

NOTE: preferredLocations is part of[Task contract].

preferredLocations simply returns <> internal property.

=== [[runTask]] Deserialize RDD and Function (From Broadcast) and Execute Function (on RDD Partition) -- runTask Method

[source, scala]

runTask(context: TaskContext): U

NOTE: U is the type of a result as defined when <ResultTask is created>>.

runTask deserializes a RDD and a function from the <> and then executes the function (on the records from the RDD <>).

NOTE: runTask is part of[Task contract] to run a task.

Internally, runTask starts by tracking the time required to deserialize a RDD and a function to execute.

runTask[creates a new closure Serializer].

NOTE: runTask uses[SparkEnv to access the current closure Serializer].

runTask[requests the closure Serializer to deserialize an RDD and the function to execute] (from <> broadcast).

NOTE: <> broadcast is defined when <ResultTask is created>>.

runTask records[_executorDeserializeTime] and[_executorDeserializeCpuTime] properties.

In the end, runTask executes the function (passing in the input context and the[records from partition of the RDD]).

NOTE: partition to use to access the records in a deserialized RDD is defined when <ResultTask was created>>.

Last update: 2020-11-27