YarnSparkHadoopUtil is…​FIXME

YarnSparkHadoopUtil can only be created when SPARK_YARN_MODE flag is enabled.

YarnSparkHadoopUtil belongs to org.apache.spark.deploy.yarn package.

Enable DEBUG logging level for org.apache.spark.deploy.yarn.YarnSparkHadoopUtil logger to see what happens inside.

Add the following line to conf/log4j.properties:


Refer to Logging.

startCredentialUpdater Method


Getting YarnSparkHadoopUtil Instance — get Method


addPathToEnvironment Method

addPathToEnvironment(env: HashMap[String, String], key: String, value: String): Unit





getApplicationAclsForYarn Method



MEMORY_OVERHEAD_FACTOR is a constant that equals to 10% for memory overhead.


MEMORY_OVERHEAD_MIN is a constant that equals to 384L for memory overhead.

Resolving Environment Variable — expandEnvironment Method

expandEnvironment(environment: Environment): String

expandEnvironment resolves environment variable using YARN’s Environment.$ or Environment.$$ methods (depending on the version of Hadoop used).

Computing YARN’s ContainerId — getContainerId Method

getContainerId: ContainerId

getContainerId is a private[spark] method that gets YARN’s ContainerId from the YARN environment variable ApplicationConstants.Environment.CONTAINER_ID and converts it to the return object using YARN’s ConverterUtils.toContainerId.

Calculating Initial Number of Executors — getInitialTargetExecutorNumber Method

getInitialTargetExecutorNumber(conf: SparkConf, numExecutors: Int = 2): Int

getInitialTargetExecutorNumber calculates the initial number of executors for Spark on YARN. It varies by whether dynamic allocation is enabled or not.

The default number of executors (aka DEFAULT_NUMBER_EXECUTORS) is 2.

With Dynamic Allocation of Executors enabled, getInitialTargetExecutorNumber is spark.dynamicAllocation.initialExecutors or spark.dynamicAllocation.minExecutors to fall back to 0 if the others are undefined.

With Dynamic Allocation of Executors disabled, getInitialTargetExecutorNumber is the value of spark.executor.instances property or SPARK_EXECUTOR_INSTANCES environment variable, or the default value (of the input parameter numExecutors) 2.

getInitialTargetExecutorNumber is used to calculate totalExpectedExecutors to start Spark on YARN in client or cluster modes.