Skip to content


A Spark driver (aka an application's driver process) is a JVM process that hosts[SparkContext] for a Spark application. It is the master node in a Spark application.

It is the cockpit of jobs and tasks execution (using[DAGScheduler] and[Task Scheduler]). It hosts[Web UI] for the environment.

.Driver with the services image::spark-driver.png[align="center"]

It splits a Spark application into tasks and schedules them to run on executors.

A driver is where the task scheduler lives and spawns tasks across workers.

A driver coordinates workers and overall execution of tasks.

NOTE:[Spark shell] is a Spark application and the driver. It creates a SparkContext that is available as sc.

Driver requires the additional services (beside the common ones like[],[],[], BroadcastManager:

  • Listener Bus
  •[] with the name MapOutputTracker
  •[] with the name BlockManagerMaster
  • MetricsSystem with the name driver
  • OutputCommitCoordinator

CAUTION: FIXME Diagram of RpcEnv for a driver (and later executors). Perhaps it should be in the notes about RpcEnv?

  • High-level control flow of work
  • Your Spark application runs as long as the Spark driver. ** Once the driver terminates, so does your Spark application.
  • Creates SparkContext, RDD's, and executes transformations and actions
  • Launches[tasks]

=== [[driver-memory]] Driver's Memory

It can be set first using[spark-submit's --driver-memory] command-line option or <> and falls back to[SPARK_DRIVER_MEMORY] if not set earlier.

NOTE: It is printed out to the standard error output in[spark-submit's verbose mode].

=== [[driver-memory]] Driver's Cores

It can be set first using[spark-submit's --driver-cores] command-line option for[cluster deploy mode].

NOTE: In[client deploy mode] the driver's memory corresponds to the memory of the JVM process the Spark application runs on.

NOTE: It is printed out to the standard error output in[spark-submit's verbose mode].

=== [[settings]] Settings

.Spark Properties [cols="1,1,2",options="header",width="100%"] |=== | Spark Property | Default Value | Description | [[spark_driver_blockManager_port]] spark.driver.blockManager.port |[spark.blockManager.port] | Port to use for the[BlockManager] on the driver.

More precisely, spark.driver.blockManager.port is used when[NettyBlockTransferService is created] (while SparkEnv is created for the driver).

| [[spark_driver_memory]] spark.driver.memory | 1g | The driver's memory size (in MiBs).

Refer to <>.

| [[spark_driver_cores]] spark.driver.cores | 1 | The number of CPU cores assigned to the driver in[cluster deploy mode].

NOTE: When yarn/[Client is created] (for Spark on YARN in cluster mode only), it sets the number of cores for ApplicationManager using spark.driver.cores.

Refer to <>.

| [[spark_driver_extraLibraryPath]] spark.driver.extraLibraryPath | |

| [[spark_driver_extraJavaOptions]] spark.driver.extraJavaOptions | | Additional JVM options for the driver.

| [[spark.driver.appUIAddress]] spark.driver.appUIAddress

spark.driver.appUIAddress is used exclusively in yarn/[Spark on YARN]. It is set when yarn/[YarnClientSchedulerBackend starts] to yarn/[run ExecutorLauncher] (and yarn/[register ApplicationMaster] for the Spark application).

| [[spark_driver_libraryPath]] spark.driver.libraryPath | |


==== [[spark_driver_extraClassPath]] spark.driver.extraClassPath

spark.driver.extraClassPath system property sets the additional classpath entries (e.g. jars and directories) that should be added to the driver's classpath in[cluster deploy mode].


For[client deploy mode] you can use a properties file or command line to set spark.driver.extraClassPath.

Do not use[SparkConf] since it is too late for client deploy mode given the JVM has already been set up to start a Spark application.

Refer to[buildSparkSubmitCommand Internal Method] for the very low-level details of how it is handled internally.

spark.driver.extraClassPath uses a OS-specific path separator.

NOTE: Use spark-submit's[--driver-class-path command-line option] on command line to override spark.driver.extraClassPath from a[Spark properties file].

Back to top