Dynamic Allocation of Executors¶
Dynamic Allocation of Executors (Dynamic Resource Allocation or Elastic Scaling) is a Spark service for adding and removing Spark executors dynamically on demand to match workload.
Unlike the "traditional" static allocation where a Spark application reserves CPU and memory resources upfront (irrespective of how much it may eventually use), in dynamic allocation you get as much as needed and no more. It scales the number of executors up and down based on workload, i.e. idle executors are removed, and when there are pending tasks waiting for executors to be launched on, dynamic allocation requests them.
Dynamic Allocation is enabled (and SparkContext
creates an ExecutorAllocationManager) when:
-
spark.dynamicAllocation.enabled configuration property is enabled
-
spark.master is non-
local
ExecutorAllocationManager is the heart of Dynamic Resource Allocation.
When enabled, it is recommended to use the External Shuffle Service.
Dynamic Allocation comes with the policy of scaling executors up and down as follows:
- Scale Up Policy requests new executors when there are pending tasks and increases the number of executors exponentially since executors start slow and Spark application may need slightly more.
- Scale Down Policy removes executors that have been idle for spark.dynamicAllocation.executorIdleTimeout seconds.
Performance Metrics¶
ExecutorAllocationManagerSource metric source is used to report performance metrics.
SparkContext.killExecutors¶
SparkContext.killExecutors is unsupported with Dynamic Allocation enabled.
Programmable Dynamic Allocation¶
SparkContext
offers a developer API to scale executors up or down.
Getting Initial Number of Executors for Dynamic Allocation¶
getDynamicAllocationInitialExecutors(conf: SparkConf): Int
getDynamicAllocationInitialExecutors
first makes sure that <
NOTE: <
If not, you should see the following WARN message in the logs:
spark.dynamicAllocation.initialExecutors less than spark.dynamicAllocation.minExecutors is invalid, ignoring its setting, please update your configs.
getDynamicAllocationInitialExecutors
makes sure that executor:Executor.md#spark.executor.instances[spark.executor.instances] is greater than <
NOTE: Both executor:Executor.md#spark.executor.instances[spark.executor.instances] and <0
when no defined explicitly.
If not, you should see the following WARN message in the logs:
spark.executor.instances less than spark.dynamicAllocation.minExecutors is invalid, ignoring its setting, please update your configs.
getDynamicAllocationInitialExecutors
sets the initial number of executors to be the maximum of:
- spark.dynamicAllocation.minExecutors
- spark.dynamicAllocation.initialExecutors
- spark.executor.instances
0
You should see the following INFO message in the logs:
Using initial executors = [initialExecutors], max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances
getDynamicAllocationInitialExecutors
is used when ExecutorAllocationManager
is requested to set the initial number of executors.
Resources¶
Documentation¶
- Dynamic Allocation in the official documentation of Apache Spark
- Dynamic allocation in the documentation of Cloudera Data Platform (CDP)
Slides¶
- Dynamic Allocation in Spark by Databricks