ResultTask — Task to Compute Result for ResultStage

ResultTask is created with a broadcast variable with the RDD and the function to execute it on and the partition.

Table 1. ResultTask’s Internal Registries and Counters
Name Description


Collection of TaskLocations.

Corresponds directly to unique entries in locs with the only rule that when locs is not defined, it is empty, and no task location preferences are defined.

Initialized when ResultTask is created.

Used exclusively when ResultTask is requested for preferred locations.

Creating ResultTask Instance

ResultTask takes the following when created:

  • stageId — the stage the task is executed for

  • stageAttemptId — the stage attempt id

  • Broadcast variable with the serialized task (as Array[Byte]). The broadcast contains of a serialized pair of RDD and the function to execute.

  • Partition to compute

  • Collection of TaskLocations, i.e. preferred locations (executors) to execute the task on

  • outputId

  • local Properties

  • The stage’s serialized TaskMetrics (as Array[Byte])

  • (optional) Job id

  • (optional) Application id

  • (optional) Application attempt id

ResultTask initializes the internal registries and counters.

preferredLocations Method

preferredLocations: Seq[TaskLocation]
preferredLocations is part of Task contract.

preferredLocations simply returns preferredLocs internal property.

Deserialize RDD and Function (From Broadcast) and Execute Function (on RDD Partition) — runTask Method

runTask(context: TaskContext): U
U is the type of a result as defined when ResultTask is created.

runTask deserializes a RDD and a function from the broadcast and then executes the function (on the records from the RDD partition).

runTask is part of Task contract to run a task.

Internally, runTask starts by tracking the time required to deserialize a RDD and a function to execute.

taskBinary broadcast is defined when ResultTask is created.

runTask records _executorDeserializeTime and _executorDeserializeCpuTime properties.

In the end, runTask executes the function (passing in the input context and the records from partition of the RDD).

partition to use to access the records in a deserialized RDD is defined when ResultTask was created.