SparkEnv¶
SparkEnv
is a handle to Spark Execution Environment with the core services of Apache Spark (that interact with each other to establish a distributed computing platform for a Spark application).
There are two separate SparkEnv
s of the driver and executors.
Core Services¶
Property | Service |
---|---|
blockManager | BlockManager |
broadcastManager | BroadcastManager |
closureSerializer | Serializer |
conf | SparkConf |
mapOutputTracker | MapOutputTracker |
memoryManager | MemoryManager |
metricsSystem | MetricsSystem |
outputCommitCoordinator | OutputCommitCoordinator |
rpcEnv | RpcEnv |
securityManager | SecurityManager |
serializer | Serializer |
serializerManager | SerializerManager |
shuffleManager | ShuffleManager |
Creating Instance¶
SparkEnv
takes the following to be created:
- Executor ID
- RpcEnv
- Serializer
- Serializer
- SerializerManager
- MapOutputTracker
- ShuffleManager
- BroadcastManager
- BlockManager
- SecurityManager
- MetricsSystem
- MemoryManager
- OutputCommitCoordinator
- SparkConf
SparkEnv
is created using create utility.
Driver's Temporary Directory¶
driverTmpDir: Option[String]
SparkEnv
defines driverTmpDir
internal registry for the driver to be used as the root directory of files added using SparkContext.addFile.
driverTmpDir
is undefined initially and is defined for the driver only when SparkEnv
utility is used to create a "base" SparkEnv.
Demo¶
import org.apache.spark.SparkEnv
// :pa -raw
// BEGIN
package org.apache.spark
object BypassPrivateSpark {
def driverTmpDir(sparkEnv: SparkEnv) = {
sparkEnv.driverTmpDir
}
}
// END
val driverTmpDir = org.apache.spark.BypassPrivateSpark.driverTmpDir(SparkEnv.get).get
The above is equivalent to the following snippet.
import org.apache.spark.SparkFiles
SparkFiles.getRootDirectory
Creating SparkEnv for Driver¶
createDriverEnv(
conf: SparkConf,
isLocal: Boolean,
listenerBus: LiveListenerBus,
numCores: Int,
mockOutputCommitCoordinator: Option[OutputCommitCoordinator] = None): SparkEnv
createDriverEnv
creates a SparkEnv execution environment for the driver.
createDriverEnv
accepts an instance of SparkConf, whether it runs in local mode or not, scheduler:LiveListenerBus.md[], the number of cores to use for execution in local mode or 0
otherwise, and a OutputCommitCoordinator (default: none).
createDriverEnv
ensures that spark-driver.md#spark_driver_host[spark.driver.host] and spark-driver.md#spark_driver_port[spark.driver.port] settings are defined.
It then passes the call straight on to the <driver
executor id, isDriver
enabled, and the input parameters).
createDriverEnv
is used when SparkContext
is created.
Creating SparkEnv for Executor¶
createExecutorEnv(
conf: SparkConf,
executorId: String,
hostname: String,
numCores: Int,
ioEncryptionKey: Option[Array[Byte]],
isLocal: Boolean): SparkEnv
createExecutorEnv(
conf: SparkConf,
executorId: String,
bindAddress: String,
hostname: String,
numCores: Int,
ioEncryptionKey: Option[Array[Byte]],
isLocal: Boolean): SparkEnv
createExecutorEnv
creates an executor's (execution) environment that is the Spark execution environment for an executor.
createExecutorEnv
simply <
NOTE: The number of cores numCores
is configured using --cores
command-line option of CoarseGrainedExecutorBackend
and is specific to a cluster manager.
createExecutorEnv
is used when CoarseGrainedExecutorBackend
utility is requested to run
.
Creating "Base" SparkEnv¶
create(
conf: SparkConf,
executorId: String,
bindAddress: String,
advertiseAddress: String,
port: Option[Int],
isLocal: Boolean,
numUsableCores: Int,
ioEncryptionKey: Option[Array[Byte]],
listenerBus: LiveListenerBus = null,
mockOutputCommitCoordinator: Option[OutputCommitCoordinator] = None): SparkEnv
create
creates the "base" SparkEnv
(that is common across the driver and executors).
create
creates a RpcEnv as sparkDriver on the driver and sparkExecutor on executors.
create
creates a Serializer (based on spark.serializer configuration property). create
prints out the following DEBUG message to the logs:
Using serializer: [serializer]
create
creates a SerializerManager.
create
creates a JavaSerializer
as the closure serializer.
creates
creates a BroadcastManager.
creates
creates a MapOutputTrackerMaster (on the driver) or a MapOutputTrackerWorker (on executors). creates
registers or looks up a MapOutputTrackerMasterEndpoint under the name of MapOutputTracker. creates
prints out the following INFO message to the logs (on the driver only):
Registering MapOutputTracker
creates
creates a ShuffleManager (based on spark.shuffle.manager configuration property).
create
creates a UnifiedMemoryManager.
With spark.shuffle.service.enabled configuration property enabled, create
creates an ExternalBlockStoreClient.
create
creates a BlockManagerMaster.
create
creates a NettyBlockTransferService.
create
creates a BlockManager.
create
creates a MetricsSystem.
create
creates a OutputCommitCoordinator and registers or looks up a OutputCommitCoordinatorEndpoint
under the name of OutputCommitCoordinator.
create
creates a SparkEnv (with all the services "stitched" together).
Logging¶
Enable ALL
logging level for org.apache.spark.SparkEnv
logger to see what happens inside.
Add the following line to conf/log4j.properties
:
log4j.logger.org.apache.spark.SparkEnv=ALL
Refer to Logging.