TaskSet¶
TaskSet
is a <
In other words, a TaskSet represents the missing partitions of a stage that (as tasks) can be run right away based on the data that is already on the cluster, e.g. map output files from previous stages, though they may fail if this data becomes unavailable.
NOTE: Since the <
TaskSet is <DAGScheduler
is requested to scheduler:DAGScheduler.md#submitMissingTasks[submit the missing tasks of a stage].
NOTE: Once scheduler:DAGScheduler.md#submitMissingTasks[submitted] for execution (to a scheduler:TaskScheduler.md[TaskScheduler]), the execution of the TaskSet is managed by a scheduler:TaskSetManager.md[TaskSetManager] that allows for configuration-properties.md#spark.task.maxFailures[spark.task.maxFailures] (default: 1
for <4
for <
[[creating-instance]] TaskSet takes the following to be created:
- [[tasks]] Collection of scheduler:Task.md[tasks] (
Array[Task[_]]
) - [[stageId]] Stage ID
- [[stageAttemptId]] Stage execution attempt ID
- [[priority]] Priority (for <
>) - [[properties]] Key-value properties
[[id]] TaskSet is uniquely identified by an id
that is the <.
) in-between.
[stageId].[stageAttemptId]
[[toString]] A textual representation (toString
) of TaskSet is TaskSet [id].
TaskSet [stageId].[stageAttemptId]
== [[fifo-scheduling]] Task Scheduling Prioritization in FIFO Scheduling
The <DAGScheduler
is requested to scheduler:DAGScheduler.md#submitMissingTasks[submit the missing tasks of a stage]).
Once scheduler:DAGScheduler.md#submitMissingTasks[submitted] for execution (to a scheduler:TaskScheduler.md[TaskScheduler]), the <TaskSetManager
(which is a <