ExternalClusterManager — Pluggable Cluster Managers

ExternalClusterManager is a contract for pluggable cluster managers. It returns a task scheduler and a backend scheduler that will be used by SparkContext to schedule tasks.

The support for pluggable cluster managers was introduced in SPARK-13904 Add support for pluggable cluster manager.

External cluster managers are registered using the java.util.ServiceLoader mechanism (with service markers under META-INF/services directory). This allows auto-loading implementations of ExternalClusterManager interface.

ExternalClusterManager is a private[spark] trait in org.apache.spark.scheduler package.
The two implementations of the ExternalClusterManager contract in Spark 2.0 are YarnClusterManager and MesosClusterManager.

ExternalClusterManager Contract

canCreate Method

canCreate(masterURL: String): Boolean

canCreate is a mechanism to match a ExternalClusterManager implementation to a given master URL.

createTaskScheduler Method

createTaskScheduler(sc: SparkContext, masterURL: String): TaskScheduler

createTaskScheduler creates a TaskScheduler given a SparkContext and the input masterURL.

createSchedulerBackend Method

createSchedulerBackend(sc: SparkContext,
  masterURL: String,
  scheduler: TaskScheduler): SchedulerBackend

createSchedulerBackend creates a SchedulerBackend given a SparkContext, the input masterURL, and TaskScheduler.

Initializing Scheduling Components — initialize Method

initialize(scheduler: TaskScheduler, backend: SchedulerBackend): Unit

initialize is called after the task scheduler and the backend scheduler were created and initialized separately.

There is a cyclic dependency between a task scheduler and a backend scheduler that begs for this additional initialization step.
TaskScheduler and SchedulerBackend (with DAGScheduler) are commonly referred to as scheduling components.