SparkEnv¶

SparkEnv is a handle to Spark Execution Environment with the core services of Apache Spark (that interact with each other to establish a distributed computing platform for a Spark application).

There are two separate SparkEnvs of the driver and executors.

Core Services¶

Property	Service
blockManager	BlockManager
broadcastManager	BroadcastManager
closureSerializer	Serializer
conf	SparkConf
mapOutputTracker	MapOutputTracker
memoryManager	MemoryManager
metricsSystem	MetricsSystem
outputCommitCoordinator	OutputCommitCoordinator
rpcEnv	RpcEnv
securityManager	SecurityManager
serializer	Serializer
serializerManager	SerializerManager
shuffleManager	ShuffleManager

Creating Instance¶

SparkEnv takes the following to be created:

Executor ID
RpcEnv
Serializer
Serializer
SerializerManager
MapOutputTracker
ShuffleManager
BroadcastManager
BlockManager
SecurityManager
MetricsSystem
MemoryManager
OutputCommitCoordinator
SparkConf

SparkEnv is created using create utility.

Driver's Temporary Directory¶

driverTmpDir: Option[String]

SparkEnv defines driverTmpDir internal registry for the driver to be used as the root directory of files added using SparkContext.addFile.

driverTmpDir is undefined initially and is defined for the driver only when SparkEnv utility is used to create a "base" SparkEnv.

Demo¶

import org.apache.spark.SparkEnv

// :pa -raw
// BEGIN
package org.apache.spark
object BypassPrivateSpark {
  def driverTmpDir(sparkEnv: SparkEnv) = {
    sparkEnv.driverTmpDir
  }
}
// END

val driverTmpDir = org.apache.spark.BypassPrivateSpark.driverTmpDir(SparkEnv.get).get

The above is equivalent to the following snippet.

import org.apache.spark.SparkFiles
SparkFiles.getRootDirectory

Creating SparkEnv for Driver¶

createDriverEnv(
  conf: SparkConf,
  isLocal: Boolean,
  listenerBus: LiveListenerBus,
  numCores: Int,
  mockOutputCommitCoordinator: Option[OutputCommitCoordinator] = None): SparkEnv

createDriverEnv creates a SparkEnv execution environment for the driver.

Spark Environment for driver

createDriverEnv accepts an instance of SparkConf, whether it runs in local mode or not, scheduler:LiveListenerBus.md[], the number of cores to use for execution in local mode or 0 otherwise, and a OutputCommitCoordinator (default: none).

createDriverEnv ensures that spark-driver.md#spark_driver_host[spark.driver.host] and spark-driver.md#spark_driver_port[spark.driver.port] settings are defined.

It then passes the call straight on to the <> (with driver executor id, isDriver enabled, and the input parameters).

createDriverEnv is used when SparkContext is created.

Creating SparkEnv for Executor¶

createExecutorEnv(
  conf: SparkConf,
  executorId: String,
  hostname: String,
  numCores: Int,
  ioEncryptionKey: Option[Array[Byte]],
  isLocal: Boolean): SparkEnv
createExecutorEnv(
  conf: SparkConf,
  executorId: String,
  bindAddress: String,
  hostname: String,
  numCores: Int,
  ioEncryptionKey: Option[Array[Byte]],
  isLocal: Boolean): SparkEnv

createExecutorEnv creates an executor's (execution) environment that is the Spark execution environment for an executor.

Spark Environment for executor

createExecutorEnv simply <> (passing in all the input parameters) and <>.

NOTE: The number of cores numCores is configured using --cores command-line option of CoarseGrainedExecutorBackend and is specific to a cluster manager.

createExecutorEnv is used when CoarseGrainedExecutorBackend utility is requested to run.

Creating "Base" SparkEnv¶

create(
  conf: SparkConf,
  executorId: String,
  bindAddress: String,
  advertiseAddress: String,
  port: Option[Int],
  isLocal: Boolean,
  numUsableCores: Int,
  ioEncryptionKey: Option[Array[Byte]],
  listenerBus: LiveListenerBus = null,
  mockOutputCommitCoordinator: Option[OutputCommitCoordinator] = None): SparkEnv

create creates the "base" SparkEnv (that is common across the driver and executors).

create creates a RpcEnv as sparkDriver on the driver and sparkExecutor on executors.

create creates a Serializer (based on spark.serializer configuration property). create prints out the following DEBUG message to the logs:

Using serializer: [serializer]

create creates a SerializerManager.

create creates a JavaSerializer as the closure serializer.

creates creates a BroadcastManager.

creates creates a MapOutputTrackerMaster (on the driver) or a MapOutputTrackerWorker (on executors). creates registers or looks up a MapOutputTrackerMasterEndpoint under the name of MapOutputTracker. creates prints out the following INFO message to the logs (on the driver only):

Registering MapOutputTracker

creates creates a ShuffleManager (based on spark.shuffle.manager configuration property).

create creates a UnifiedMemoryManager.

With spark.shuffle.service.enabled configuration property enabled, create creates an ExternalBlockStoreClient.

create creates a BlockManagerMaster.

create creates a NettyBlockTransferService.

Creating BlockManager for the Driver

Creating BlockManager for Executor

create creates a BlockManager.

create creates a MetricsSystem.

create creates a OutputCommitCoordinator and registers or looks up a OutputCommitCoordinatorEndpoint under the name of OutputCommitCoordinator.

create creates a SparkEnv (with all the services "stitched" together).

Logging¶

Enable ALL logging level for org.apache.spark.SparkEnv logger to see what happens inside.

Add the following line to conf/log4j.properties:

log4j.logger.org.apache.spark.SparkEnv=ALL

Refer to Logging.