ResultTask¶
ResultTask[T, U] is a Task that executes a partition processing function on a partition with records (of type T) to produce a result (of type U) that is sent back to the driver.
T -- [ResultTask] --> U
Creating Instance¶
ResultTask takes the following to be created:
- Stage ID
- Stage Attempt ID
- Broadcast variable with a serialized task (
Broadcast[Array[Byte]]) - Partition to compute
- TaskLocation
- Output ID
- Local Properties
- Serialized TaskMetrics (
Array[Byte]) - ActiveJob ID (optional)
- Application ID (optional)
- Application Attempt ID (optional)
-
isBarrierflag (default:false)
ResultTask is created when:
DAGScheduleris requested to submit missing tasks of a ResultStage
Running Task¶
runTask(
context: TaskContext): U
runTask is part of the Task abstraction.
runTask deserializes a RDD and a partition processing function from the broadcast variable (using the Closure Serializer).
In the end, runTask executes the function (on the records from the partition of the RDD).