ResultTask¶

ResultTask[T, U] is a Task that executes a partition processing function on a partition with records (of type T) to produce a result (of type U) that is sent back to the driver.

T -- [ResultTask] --> U

Creating Instance¶

ResultTask takes the following to be created:

Stage ID
Stage Attempt ID
Broadcast variable with a serialized task (Broadcast[Array[Byte]])
Partition to compute
TaskLocation
Output ID
Local Properties
Serialized TaskMetrics (Array[Byte])
ActiveJob ID (optional)
Application ID (optional)
Application Attempt ID (optional)
isBarrier flag (default: false)

ResultTask is created when:

DAGScheduler is requested to submit missing tasks of a ResultStage

Running Task¶

runTask(
  context: TaskContext): U

runTask is part of the Task abstraction.

runTask deserializes a RDD and a partition processing function from the broadcast variable (using the Closure Serializer).

In the end, runTask executes the function (on the records from the partition of the RDD).