Skip to content


== [[Pool]] Schedulable Pool

Pool is a[Schedulable] entity that represents a tree of[TaskSetManagers], i.e. it contains a collection of TaskSetManagers or the Pools thereof.

A Pool has a mandatory name, a[scheduling mode], initial minShare and weight that are defined when it is created.

NOTE: An instance of Pool is created when[TaskSchedulerImpl is initialized].

NOTE: The[TaskScheduler Contract] and[Schedulable Contract] both require that their entities have rootPool of type Pool.

=== [[increaseRunningTasks]] increaseRunningTasks Method


=== [[decreaseRunningTasks]] decreaseRunningTasks Method


=== [[taskSetSchedulingAlgorithm]] taskSetSchedulingAlgorithm Attribute

Using the[scheduling mode] (given when a Pool object is created), Pool selects <> and sets taskSetSchedulingAlgorithm:

  • <> for FIFO scheduling mode.
  • <> for FAIR scheduling mode.

It throws an IllegalArgumentException when unsupported scheduling mode is passed on:

Unsupported spark.scheduler.mode: [schedulingMode]

TIP: Read about the scheduling modes in[SchedulingMode].

NOTE: taskSetSchedulingAlgorithm is used in <>.

=== [[getSortedTaskSetQueue]] Getting TaskSetManagers Sorted -- getSortedTaskSetQueue Method

NOTE: getSortedTaskSetQueue is part of the[Schedulable Contract].

getSortedTaskSetQueue sorts all the[Schedulables] in[schedulableQueue] queue by a <> (from the internal <>).

NOTE: It is called when[TaskSchedulerImpl processes executor resource offers].

=== [[schedulableNameToSchedulable]] Schedulables by Name -- schedulableNameToSchedulable Registry

[source, scala]

schedulableNameToSchedulable = new ConcurrentHashMap[String, Schedulable]

schedulableNameToSchedulable is a lookup table of[Schedulable] objects by their names.

Beside the obvious usage in the housekeeping methods like addSchedulable, removeSchedulable, getSchedulableByName from the[Schedulable Contract], it is exclusively used in[SparkContext.getPoolForName].

=== [[addSchedulable]] addSchedulable Method

NOTE: addSchedulable is part of the[Schedulable Contract].

addSchedulable adds a Schedulable to the[schedulableQueue] and <>.

More importantly, it sets the Schedulable entity's[parent] to itself.

=== [[removeSchedulable]] removeSchedulable Method

NOTE: removeSchedulable is part of the[Schedulable Contract].

removeSchedulable removes a Schedulable from the[schedulableQueue] and <>.

NOTE: removeSchedulable is the opposite to <addSchedulable method>>.

=== [[SchedulingAlgorithm]] SchedulingAlgorithm

SchedulingAlgorithm is the interface for a sorting algorithm to sort[Schedulables].

There are currently two SchedulingAlgorithms:

  • <> for FIFO scheduling mode.
  • <> for FAIR scheduling mode.

==== [[FIFOSchedulingAlgorithm]] FIFOSchedulingAlgorithm

FIFOSchedulingAlgorithm is a scheduling algorithm that compares Schedulables by their priority first and, when equal, by their stageId.

NOTE: priority and stageId are part of[Schedulable Contract].

CAUTION: FIXME A picture is worth a thousand words. How to picture the algorithm?

==== [[FairSchedulingAlgorithm]] FairSchedulingAlgorithm

FairSchedulingAlgorithm is a scheduling algorithm that compares Schedulables by their minShare, runningTasks, and weight.

NOTE: minShare, runningTasks, and weight are part of[Schedulable Contract].

.FairSchedulingAlgorithm image::spark-pool-FairSchedulingAlgorithm.png[align="center"]

For each input Schedulable, minShareRatio is computed as runningTasks by minShare (but at least 1) while taskToWeightRatio is runningTasks by weight.

=== [[getSchedulableByName]] Finding Schedulable by Name -- getSchedulableByName Method

[source, scala]

getSchedulableByName(schedulableName: String): Schedulable

NOTE: getSchedulableByName is part of the <> to find a <> by name.