TaskSet is a collection of independent tasks of a single stage (and a stage execution attempt) that are missing (uncomputed), i.e. for which computation results are unavailable (as RDD blocks on BlockManagers on executors).
In other words, a
TaskSet represents the missing partitions of a stage that (as tasks) can be run right away based on the data that is already on the cluster, e.g. map output files from previous stages, though they may fail if this data becomes unavailable.
Since the tasks of a
Once submitted for execution (to a TaskScheduler), the execution of the
TaskSet takes the following to be created:
A textual representation (
TaskSet is TaskSet [id].