Skip to content

Driver

A Spark driver (aka an application's driver process) is a JVM process that hosts SparkContext.md[SparkContext] for a Spark application. It is the master node in a Spark application.

It is the cockpit of jobs and tasks execution (using scheduler:DAGScheduler.md[DAGScheduler] and scheduler:TaskScheduler.md[Task Scheduler]). It hosts spark-webui.md[Web UI] for the environment.

.Driver with the services image::spark-driver.png[align="center"]

It splits a Spark application into tasks and schedules them to run on executors.

A driver is where the task scheduler lives and spawns tasks across workers.

A driver coordinates workers and overall execution of tasks.

NOTE: spark-shell.md[Spark shell] is a Spark application and the driver. It creates a SparkContext that is available as sc.

Driver requires the additional services (beside the common ones like shuffle:ShuffleManager.md[], memory:MemoryManager.md[], storage:BlockTransferService.md[], BroadcastManager:

  • Listener Bus
  • rpc:index.md[]
  • scheduler:MapOutputTrackerMaster.md[] with the name MapOutputTracker
  • storage:BlockManagerMaster.md[] with the name BlockManagerMaster
  • MetricsSystem with the name driver
  • OutputCommitCoordinator

CAUTION: FIXME Diagram of RpcEnv for a driver (and later executors). Perhaps it should be in the notes about RpcEnv?

  • High-level control flow of work
  • Your Spark application runs as long as the Spark driver. ** Once the driver terminates, so does your Spark application.
  • Creates SparkContext, RDD's, and executes transformations and actions
  • Launches scheduler:Task.md[tasks]

=== [[driver-memory]] Driver's Memory

It can be set first using spark-submit/index.md#command-line-options[spark-submit's --driver-memory] command-line option or <> and falls back to spark-submit/index.md#environment-variables[SPARK_DRIVER_MEMORY] if not set earlier.

NOTE: It is printed out to the standard error output in spark-submit/index.md#verbose-mode[spark-submit's verbose mode].

Driver Cores

It can be set first using spark-submit/index.md#driver-cores[spark-submit's --driver-cores] command-line option for cluster deploy mode.

NOTE: In client deploy mode the driver's memory corresponds to the memory of the JVM process the Spark application runs on.

NOTE: It is printed out to the standard error output in spark-submit/index.md#verbose-mode[spark-submit's verbose mode].

=== [[settings]] Settings

.Spark Properties [cols="1,1,2",options="header",width="100%"] |=== | Spark Property | Default Value | Description | [[spark_driver_blockManager_port]] spark.driver.blockManager.port | storage:BlockManager.md#spark_blockManager_port[spark.blockManager.port] | Port to use for the storage:BlockManager.md[BlockManager] on the driver.

More precisely, spark.driver.blockManager.port is used when core:SparkEnv.md#NettyBlockTransferService[NettyBlockTransferService is created] (while SparkEnv is created for the driver).

| [[spark_driver_memory]] spark.driver.memory | 1g | The driver's memory size (in MiBs).

Refer to <>.

| [[spark_driver_cores]] spark.driver.cores | 1 | The number of CPU cores assigned to the driver in cluster deploy mode.

NOTE: When yarn/spark-yarn-client.md#creating-instance[Client is created] (for Spark on YARN in cluster mode only), it sets the number of cores for ApplicationManager using spark.driver.cores.

Refer to <>.

| [[spark_driver_extraLibraryPath]] spark.driver.extraLibraryPath | |

| [[spark_driver_extraJavaOptions]] spark.driver.extraJavaOptions | | Additional JVM options for the driver.

| [[spark.driver.appUIAddress]] spark.driver.appUIAddress

spark.driver.appUIAddress is used exclusively in yarn/README.md[Spark on YARN]. It is set when yarn/spark-yarn-client-yarnclientschedulerbackend.md#start[YarnClientSchedulerBackend starts] to yarn/spark-yarn-applicationmaster.md#runExecutorLauncher[run ExecutorLauncher] (and yarn/spark-yarn-applicationmaster.md#registerAM[register ApplicationMaster] for the Spark application).

| [[spark_driver_libraryPath]] spark.driver.libraryPath | |

|===

spark.driver.extraClassPath

spark.driver.extraClassPath system property sets the additional classpath entries (e.g. jars and directories) that should be added to the driver's classpath in cluster deploy mode.

[NOTE]

For client deploy mode you can use a properties file or command line to set spark.driver.extraClassPath.

Do not use SparkConf.md[SparkConf] since it is too late for client deploy mode given the JVM has already been set up to start a Spark application.

Refer to spark-class.md#buildSparkSubmitCommand[buildSparkSubmitCommand Internal Method] for the very low-level details of how it is handled internally.

spark.driver.extraClassPath uses a OS-specific path separator.

NOTE: Use spark-submit's spark-submit/index.md#driver-class-path[--driver-class-path command-line option] on command line to override spark.driver.extraClassPath from a spark-properties.md#spark-defaults-conf[Spark properties file].