ResultTask¶
ResultTask[T, U]
is a Task that executes a partition processing function on a partition with records (of type T
) to produce a result (of type U
) that is sent back to the driver.
T -- [ResultTask] --> U
Creating Instance¶
ResultTask
takes the following to be created:
- Stage ID
- Stage Attempt ID
- Broadcast variable with a serialized task (
Broadcast[Array[Byte]]
) - Partition to compute
- TaskLocation
- Output ID
- Local Properties
- Serialized TaskMetrics (
Array[Byte]
) - ActiveJob ID (optional)
- Application ID (optional)
- Application Attempt ID (optional)
-
isBarrier
flag (default:false
)
ResultTask
is created when:
DAGScheduler
is requested to submit missing tasks of a ResultStage
Running Task¶
runTask(
context: TaskContext): U
runTask
is part of the Task abstraction.
runTask
deserializes a RDD and a partition processing function from the broadcast variable (using the Closure Serializer).
In the end, runTask
executes the function (on the records from the partition of the RDD
).