Skip to content

== [[FairSchedulableBuilder]] FairSchedulableBuilder -- SchedulableBuilder for FAIR Scheduling Mode

FairSchedulableBuilder is a <> that is <> exclusively for scheduler:TaskSchedulerImpl.md[TaskSchedulerImpl] for FAIR scheduling mode (when ROOT:configuration-properties.md#spark.scheduler.mode[spark.scheduler.mode] configuration property is FAIR).

[[creating-instance]] FairSchedulableBuilder takes the following to be created:

  • [[rootPool]] <>
  • [[conf]] ROOT:SparkConf.md[]

Once <>, TaskSchedulerImpl requests the FairSchedulableBuilder to <>.

[[DEFAULT_SCHEDULER_FILE]] FairSchedulableBuilder uses the pools defined in an <> that is assumed to be the value of the ROOT:configuration-properties.md#spark.scheduler.allocation.file[spark.scheduler.allocation.file] configuration property or the default fairscheduler.xml (that is <>).

TIP: Use conf/fairscheduler.xml.template as a template for the <>.

[[DEFAULT_POOL_NAME]] FairSchedulableBuilder always has the default pool defined (and <> unless done in the <>).

[[FAIR_SCHEDULER_PROPERTIES]] [[spark.scheduler.pool]] FairSchedulableBuilder uses spark.scheduler.pool local property for the name of the pool to use when requested to <> (default: <>).

NOTE: Use spark-sparkcontext-local-properties.md#setLocalProperty[SparkContext.setLocalProperty] to set properties per thread (aka local properties) to group jobs in logical groups, e.g. to allow FairSchedulableBuilder to use spark.scheduler.pool property and to group jobs from different threads to be submitted for execution on a non-<> pool.

[source, scala]

scala> :type sc org.apache.spark.SparkContext

sc.setLocalProperty("spark.scheduler.pool", "production")

// whatever is executed afterwards is submitted to production pool

[[logging]] [TIP] ==== Enable ALL logging level for org.apache.spark.scheduler.FairSchedulableBuilder logger to see what happens inside.

Add the following line to conf/log4j.properties:

log4j.logger.org.apache.spark.scheduler.FairSchedulableBuilder=ALL

Refer to <>.

=== [[allocations-file]] Allocation Pools Configuration File

The allocation pools configuration file is an XML file.

The default conf/fairscheduler.xml.template is as follows:

[source, xml]

FAIR 1 2 FIFO 2 3


TIP: The top-level element's name allocations can be anything. Spark does not insist on allocations and accepts any name.

=== [[buildPools]] Building (Tree of) Pools of Schedulables -- buildPools Method

[source, scala]

buildPools(): Unit

NOTE: buildPools is part of the <> to build a tree of <>.

buildPools <> if available and then <>.

buildPools prints out the following INFO message to the logs when the configuration file (per the ROOT:configuration-properties.md#spark.scheduler.allocation.file[spark.scheduler.allocation.file] configuration property) could be read:

Creating Fair Scheduler pools from [file]

buildPools prints out the following INFO message to the logs when the ROOT:configuration-properties.md#spark.scheduler.allocation.file[spark.scheduler.allocation.file] configuration property was not used to define the configuration file and the <> is used instead:

Creating Fair Scheduler pools from default file: [DEFAULT_SCHEDULER_FILE]

When neither ROOT:configuration-properties.md#spark.scheduler.allocation.file[spark.scheduler.allocation.file] configuration property nor the <> could be used, buildPools prints out the following WARN message to the logs:

Fair Scheduler configuration file not found so jobs will be scheduled in FIFO order. To use fair scheduling, configure pools in [DEFAULT_SCHEDULER_FILE] or set spark.scheduler.allocation.file to a file that contains the configuration.

=== [[addTaskSetManager]] addTaskSetManager Method

[source, scala]

addTaskSetManager(manager: Schedulable, properties: Properties): Unit

NOTE: addTaskSetManager is part of the <> to register a new <> with the <>

addTaskSetManager finds the pool by name (in the given Properties) under the <> property or defaults to the <> pool if undefined.

addTaskSetManager then requests the <> to <>.

Unless found, addTaskSetManager creates a new <> with the <> (as if the <> pool were used) and requests the <> to <>. In the end, addTaskSetManager prints out the following WARN message to the logs:

A job was submitted with scheduler pool [poolName], which has not been configured. This can happen when the file that pools are read from isn't set, or when that file doesn't contain [poolName]. Created [poolName] with default configuration (schedulingMode: [mode], minShare: [minShare], weight: [weight])

addTaskSetManager then requests the pool (found or newly-created) to <> the given <>.

In the end, addTaskSetManager prints out the following INFO message to the logs:

Added task set [name] tasks to pool [poolName]

=== [[buildDefaultPool]] Registering Default Pool -- buildDefaultPool Method

[source, scala]

buildDefaultPool(): Unit

buildDefaultPool requests the <> to <> (one with the <> name).

Unless already available, buildDefaultPool creates a <> with the following:

  • <> pool name

  • FIFO scheduling mode

  • 0 for the initial minimum share

  • 1 for the initial weight

In the end, buildDefaultPool requests the <> to <> followed by the INFO message in the logs:

Created default pool: [name], schedulingMode: [mode], minShare: [minShare], weight: [weight]

NOTE: buildDefaultPool is used exclusively when FairSchedulableBuilder is requested to <>.

=== [[buildFairSchedulerPool]] Building Pools from XML Allocations File -- buildFairSchedulerPool Internal Method

[source, scala]

buildFairSchedulerPool( is: InputStream, fileName: String): Unit


buildFairSchedulerPool starts by loading the XML file from the given InputStream.

For every pool element, buildFairSchedulerPool creates a <> with the following:

  • Pool name per name attribute

  • Scheduling mode per schedulingMode element (case-insensitive with FIFO as the default)

  • Initial minimum share per minShare element (default: 0)

  • Initial weight per weight element (default: 1)

In the end, buildFairSchedulerPool requests the <> to <> followed by the INFO message in the logs:

Created pool: [name], schedulingMode: [mode], minShare: [minShare], weight: [weight]

NOTE: buildFairSchedulerPool is used exclusively when FairSchedulableBuilder is requested to <>.


Last update: 2020-10-06