Schedulable Pool

Pool is a Schedulable entity that represents a tree of TaskSetManagers, i.e. it contains a collection of TaskSetManagers or the Pools thereof.

A Pool has a mandatory name, a scheduling mode, initial minShare and weight that are defined when it is created.

An instance of Pool is created when TaskSchedulerImpl is initialized.
The TaskScheduler Contract and Schedulable Contract both require that their entities have rootPool of type Pool.

increaseRunningTasks Method

FIXME

decreaseRunningTasks Method

FIXME

taskSetSchedulingAlgorithm Attribute

Using the scheduling mode (given when a Pool object is created), Pool selects SchedulingAlgorithm and sets taskSetSchedulingAlgorithm:

It throws an IllegalArgumentException when unsupported scheduling mode is passed on:

Unsupported spark.scheduler.mode: [schedulingMode]
Read about the scheduling modes in SchedulingMode.
taskSetSchedulingAlgorithm is used in getSortedTaskSetQueue.

Getting TaskSetManagers Sorted — getSortedTaskSetQueue Method

getSortedTaskSetQueue is part of the Schedulable Contract.

getSortedTaskSetQueue sorts all the Schedulables in schedulableQueue queue by a SchedulingAlgorithm (from the internal taskSetSchedulingAlgorithm).

Schedulables by Name — schedulableNameToSchedulable Registry

schedulableNameToSchedulable = new ConcurrentHashMap[String, Schedulable]

schedulableNameToSchedulable is a lookup table of Schedulable objects by their names.

Beside the obvious usage in the housekeeping methods like addSchedulable, removeSchedulable, getSchedulableByName from the Schedulable Contract, it is exclusively used in SparkContext.getPoolForName.

addSchedulable Method

addSchedulable is part of the Schedulable Contract.

addSchedulable adds a Schedulable to the schedulableQueue and schedulableNameToSchedulable.

More importantly, it sets the Schedulable entity’s parent to itself.

removeSchedulable Method

removeSchedulable is part of the Schedulable Contract.

removeSchedulable removes a Schedulable from the schedulableQueue and schedulableNameToSchedulable.

removeSchedulable is the opposite to addSchedulable method.

SchedulingAlgorithm

SchedulingAlgorithm is the interface for a sorting algorithm to sort Schedulables.

There are currently two SchedulingAlgorithms:

FIFOSchedulingAlgorithm

FIFOSchedulingAlgorithm is a scheduling algorithm that compares Schedulables by their priority first and, when equal, by their stageId.

priority and stageId are part of Schedulable Contract.
FIXME A picture is worth a thousand words. How to picture the algorithm?

FairSchedulingAlgorithm

FairSchedulingAlgorithm is a scheduling algorithm that compares Schedulables by their minShare, runningTasks, and weight.

minShare, runningTasks, and weight are part of Schedulable Contract.
spark pool FairSchedulingAlgorithm
Figure 1. FairSchedulingAlgorithm

For each input Schedulable, minShareRatio is computed as runningTasks by minShare (but at least 1) while taskToWeightRatio is runningTasks by weight.

Finding Schedulable by Name — getSchedulableByName Method

getSchedulableByName(schedulableName: String): Schedulable
getSchedulableByName is part of the Schedulable Contract to find a Schedulable by name.

getSchedulableByName…​FIXME