Spark Configuration Properties


Name Description


Port to use for block managers to listen on when a more specific setting is not provided (i.e. spark.driver.blockManager.port for the driver).

Default: 0

In Spark on Kubernetes the default port is 7079


Number of partitions to use for HashPartitioner

spark.default.parallelism corresponds to default parallelism of a scheduler backend and is as follows:


Default: 64


Port the block manager on the driver listens on


The maximum size of all results of the tasks in a TaskSet

Default: 1g

Used when:


User-defined class path for executors, i.e. URLs representing user-defined class path entries that are added to an executor’s class path. URLs are separated by system-dependent path separator, i.e. : on Unix-like systems and ; on Microsoft Windows.

Default: (empty)

Used when:


Number of cores of an Executor


Extra Java options of an Executor

Used when Spark on YARN’s ExecutorRunnable is requested to prepare the command to launch CoarseGrainedExecutorBackend in a YARN container


Extra library paths separated by system-dependent path separator, i.e. : on Unix/MacOS systems and ; on Microsoft Windows

Used when Spark on YARN’s ExecutorRunnable is requested to prepare the command to launch CoarseGrainedExecutorBackend in a YARN container








Interval after which an Executor reports heartbeat and metrics for active tasks to the driver

Default: 10s


Number of times an Executor will try to send heartbeats to the driver before it gives up and exits (with exit code 56).

Default: 60


Number of Executor in use

Default: 0

Initial per-task memory size needed to store a block in memory.

Default: 1024 * 1024

Used when MemoryStore is requested to putIteratorAsValues and putIteratorAsBytes


Default: 1048576B


Flag to control whether to load classes in user jars before those in Spark jars

Default: false


Amount of memory to use for an Executor

Default: 1g

Equivalent to SPARK_EXECUTOR_MEMORY environment variable.





For locality-aware delay scheduling for PROCESS_LOCAL, NODE_LOCAL, and RACK_LOCAL TaskLocalities when locality-specific setting is not set.

Default: 3s


Scheduling delay for NODE_LOCAL TaskLocality

Default: The value of spark.locality.wait


Scheduling delay for PROCESS_LOCAL TaskLocality

Default: The value of spark.locality.wait


Scheduling delay for RACK_LOCAL TaskLocality

Default: The value of spark.locality.wait


How frequently to reprint duplicate exceptions in full (in millis).

Default: 10000


Master URL to connect a Spark application to


Path to the configuration file of FairSchedulableBuilder

Default: fairscheduler.xml (on a Spark application’s class path)


How long to wait before a task can be re-launched on the executor where it once failed. It is to prevent repeated task failures due to executor failures.

Default: 0L


Scheduling Mode of the TaskSchedulerImpl, i.e. case-insensitive name of the scheduling mode that TaskSchedulerImpl uses to choose between the available SchedulableBuilders for task scheduling (of tasks of jobs submitted for execution to the same SparkContext)

Default: FIFO

Supported values:

  • FAIR for fair sharing (of cluster resources)

  • FIFO (default) for queueing jobs one after another

Task scheduling is an algorithm that is used to assign cluster resources (CPU cores and memory) to tasks (that are part of jobs with one or more stages). Fair sharing allows for executing tasks of different jobs at the same time (that were all submitted to the same SparkContext). In FIFO scheduling mode a single SparkContext can submit a single job for execution only (regardless of how many cluster resources the job really use which could lead to a inefficient utilization of cluster resources and a longer execution of the Spark application overall).

Scheduling mode is particularly useful in multi-tenant environments in which a single SparkContext could be shared across different users (to make a cluster resource utilization more efficient).

Use web UI to know the current scheduling mode (e.g. Environment tab as part of Spark Properties and Jobs tab as Scheduling Mode).


Threshold above which Spark warns a user that an initial TaskSet may be starved

Default: 15s


The number of CPU cores used to schedule (allocate for) a task

Default: 1

Used when:


The number of individual task failures before giving up on the entire TaskSet and the job afterwards




spark.memory.offHeap.size is the absolute amount of memory in bytes which can be used for off-heap allocation. This setting has no impact on heap memory usage, so if your executors' total memory consumption must fit within some hard limit then be sure to shrink your JVM heap size accordingly.

Default: 0

Must be set to a positive value when spark.memory.offHeap.enabled is enabled (true).

Must not be negative


spark.memory.storageFraction controls the fraction of the memory to use for storage region.

Default: 0.5


spark.memory.fraction is the fraction of JVM heap space used for execution and storage.

Default: 0.6


spark.memory.useLegacyMode controls the type of the MemoryManager to use. When enabled (i.e. true) it is the legacy StaticMemoryManager while UnifiedMemoryManager otherwise (i.e. false).

Default: false


spark.memory.offHeap.enabled controls whether Spark will attempt to use off-heap memory for certain operations (true) or not (false).

Default: false

Tracks whether Tungsten memory will be allocated on the JVM heap or off-heap (using sun.misc.Unsafe).

If enabled, spark.memory.offHeap.size has to be greater than 0.

Used when MemoryManager is requested for tungstenMemoryMode.


Size of the in-memory buffer for each shuffle file output stream, in KiB unless otherwise specified. These buffers reduce the number of disk seeks and system calls made in creating intermediate shuffle files.

Default: 32k

Must be greater than 0 and less than or equal to 2097151 ((Integer.MAX_VALUE - 15) / 1024)


Size of object batches when reading or writing from serializers.

Default: 10000


Initial threshold for the size of an in-memory collection

Default: 5 * 1024 * 1024

Used by Spillable


(internal) The maximum number of elements in memory before forcing the shuffle sorter to spill. Claimed to be used for testing only

Default: Integer.MAX_VALUE

The default value is to never force the sorter to spill, until we reach some limitations, like the max page size limitation for the pointer array in the sorter.

Used when:

  • ShuffleExternalSorter is created

  • Spillable is requested to maybeSpill


Specifies the fully-qualified class name or the alias of the ShuffleManager in a Spark application

Default: sort

The supported aliases:

  • sort

  • tungsten-sort

Used when SparkEnv object is requested to create a "base" SparkEnv for a driver or an executor


Default: 8


Size of serialized shuffle map output statuses when MapOutputTrackerMaster uses to determine whether to use a broadcast variable to send them to executors

Default: 512k

Must be below spark.rpc.message.maxSize (to prevent sending an RPC message that is too large)


Maximum allowed message size for RPC communication (in MB unless specified)

Default: 128

Generally only applies to map output size (serialized) information sent between executors and the driver.

Increase this if you are running jobs with many thousands of map and reduce tasks and see messages about the RPC message size.


(internal) Minimum number of partitions (threshold) when MapStatus object creates a HighlyCompressedMapStatus (over CompressedMapStatus) when requested for one (for ShuffleWriters).

Default: 2000

Must be a positive integer (above 0)


Enables locality preferences for reduce tasks

Default: true

When enabled (true), MapOutputTrackerMaster will compute the preferred hosts on which to run a given map output partition in a given shuffle, i.e. the nodes that the most outputs for that partition are on.


Maximum number of reduce partitions below which SortShuffleManager avoids merge-sorting data for no map-side aggregation

Default: 200


Initial buffer size for sorting

Default: 4096

Used exclusively when UnsafeShuffleWriter is requested to open (and creates a ShuffleExternalSorter)


Controls whether DiskBlockObjectWriter should force outstanding writes to disk while committing a single atomic block, i.e. all operating system buffers should synchronize with the disk to ensure that all changes to a file are in fact recorded in the storage.

Default: false

Used when BlockManager is requested for a DiskBlockObjectWriter


The file system for this buffer size after each partition is written in unsafe shuffle writer. In KiB unless otherwise specified.

Default: 32k

Must be greater than 0 and less than or equal to 2097151 ((Integer.MAX_VALUE - 15) / 1024)


Time (in ms) between resource offers revives

Default: 1s


Minimum ratio of (registered resources / total expected resources) before submitting tasks

Default: 0


Time to wait for sufficient resources available

Default: 30s


When enabled (true), copying data between two Java FileInputStreams uses Java FileChannels (Java NIO) to improve copy performance.

Default: true


Controls whether to use the External Shuffle Service

Default: false

When enabled (true), the driver registers itself with the shuffle service.


Default: 7337


Controls whether to compress shuffle output when stored

Default: true


Enables fast merge strategy for UnsafeShuffleWriter to merge spill files.

Default: true


Controls whether to compress RDD partitions when stored serialized.

Default: false


Controls whether to compress shuffle output temporarily spilled to disk.

Default: true


Default: 5

Controls whether to use IO encryption

Default: false


Default: org.apache.spark.serializer.JavaSerializer


Default: org.apache.spark.serializer.JavaSerializer

The default CompressionCodec

Default: lz4

The block size of the LZ4CompressionCodec

Default: 32k

The block size of the SnappyCompressionCodec

Default: 32k

The buffer size of the BufferedOutputStream of the ZStdCompressionCodec

Default: 32k

The buffer is used to avoid the overhead of excessive JNI calls while compressing or uncompressing small amount of data

The compression level of the ZStdCompressionCodec

Default: 1

The default level is the fastest of all with reasonably high compression ratio


Default: 65536


Enables cleaning checkpoint files when a checkpointed reference is out of scope

Default: false


Controls how often to trigger a garbage collection

Default: 30min


Controls whether to enable ContextCleaner

Default: true


Controls whether the cleaning thread should block on cleanup tasks (other than shuffle, which is controlled by spark.cleaner.referenceTracking.blocking.shuffle)

Default: true


Controls whether the cleaning thread should block on shuffle cleanup tasks.

Default: false


The size of a block (in kB unless the unit is specified)

Default: 4m


Controls broadcast compression

Default: true

Unique identifier of a Spark application that Spark uses to uniquely identify metric sources.

Set when SparkContext is created (right after TaskScheduler is started that actually gives the identifier).

Application Name

Default: (undefined)


Timeout to use for the Default Endpoint Lookup Timeout

Default: 120s


Number of attempts to send a message to and receive a response from a remote endpoint.

Default: 3


Time to wait between retries.

Default: 3s


Timeout for RPC ask calls

Default: 120s

Network timeout to use for RPC remote endpoint lookup. Fallback for spark.rpc.askTimeout.

Default: 120s