Task is created when
DAGScheduler is requested to submit missing tasks of a stage.
runTask( context: TaskContext): T
Runs the task (in a TaskContext)
Task is requested to run
Task takes the following to be created:
- Stage ID
- Stage (execution) Attempt ID
- Partition ID
- Local Properties
- Serialized TaskMetrics (
- Job ID (default:
- Application ID (default:
- Application Attempt ID (default:
Task is an abstract class and cannot be created directly. It is created indirectly for the concrete Tasks.
Task is a Java
Serializable so it can be serialized and send over the wire from the driver to executors.
TaskLocations that represent preferred locations (executors) to execute the task on.
Empty by default and so no task location preferences are defined that says the task could be launched on any executor.
Running Task Thread¶
run( taskAttemptId: Long, attemptNumber: Int, metricsSystem: MetricsSystem): T
SparkEnv to access the current BlockManager.
run is a
final method and so must not be overriden.
run creates a Hadoop
CallerContext and sets it.
run runs the task.
This is the moment when the custom
Task's runTask is executed.
In the end,
TaskContextImpl that the task has completed (regardless of the final outcome -- a success or a failure).
In case of any exceptions,
TaskContextImpl that the task has failed.
MemoryStore to release unroll memory for this task (for both
OFF_HEAP memory modes).
SparkEnv to access the current MemoryManager.
run is used when
TaskRunner is requested to run (when
Executor is requested to launch a task (on "Executor task launch worker" thread pool sometime in the future)).
Task can be in one of the following states (as described by
RUNNINGwhen the task is being started.
FINISHEDwhen the task finished with the serialized result.
FAILEDwhen the task fails, e.g. when FetchFailedException,
KILLEDwhen an executor kills a task.
States are the values of
Task status updates are sent from executors to the driver through ExecutorBackend.
Task is finished when it is in one of
FAILED states are considered failures.
Collecting Latest Values of Accumulators¶
collectAccumulatorUpdates( taskFailed: Boolean = false): Seq[AccumulableInfo]
collectAccumulatorUpdates collects the latest values of internal and external accumulators from a task (and returns the values as a collection of AccumulableInfo).
collectAccumulatorUpdates uses TaskContextImpl to access the task's
collectAccumulatorUpdates collects the latest values of:
internal accumulators whose current value is not the zero value and the
RESULT_SIZEaccumulator (regardless whether the value is its zero or not).
collectAccumulatorUpdates returns an empty collection when TaskContextImpl is not initialized.
collectAccumulatorUpdates is used when
TaskRunner runs a task (and sends a task's final results back to the driver).
kill( interruptThread: Boolean): Unit
kill marks the task to be killed, i.e. it sets the internal
_killed flag to
kill calls TaskContextImpl.markInterrupted when
context is set.
interruptThread is enabled and the internal
taskThread is available,
kill interrupts it.
CAUTION: FIXME When could
interruptThread not be set?