ActiveJob (job, action job) is a top-level work item (computation) submitted to DAGScheduler for execution (usually to compute the result of an
Executing a job is equivalent to computing the partitions of the RDD an action has been executed upon. The number of partitions (
numPartitions) to compute in a job depends on the type of a stage (ResultStage or ShuffleMapStage).
A job starts with a single target RDD, but can ultimately include other
RDDs that are all part of RDD lineage.
The parent stages are always ShuffleMapStages.
Not always all partitions have to be computed for ResultStages (e.g. for actions like
ActiveJob takes the following to be created:
ActiveJob is created when:
- Map-Stage Job that computes the map output files for a ShuffleMapStage (for
submitMapStage) before any downstream stages are submitted
- Result job that computes a ResultStage to execute an action
Finished (Computed) Partitions¶
finished registry of flags to track partitions that have already been computed (
true) or not (