ExternalShuffleService is a Spark service that can serve RDD and shuffle blocks.
ExternalShuffleService manages shuffle output files so they are available to executors. As the shuffle output files are managed externally to the executors it offers an uninterrupted access to the shuffle output files regardless of executors being killed or down (esp. with Dynamic Allocation of Executors).
ExternalShuffleService can be launched from command line.
ExternalShuffleService is enabled on the driver and executors using spark.shuffle.service.enabled configuration property.
Spark on YARN uses a custom external shuffle service (
ExternalShuffleService can be launched as a standalone application using spark-class.
main Entry Point¶
main( args: Array[String]): Unit
main is the entry point of
ExternalShuffleService standalone application.
main prints out the following INFO message to the logs:
Started daemon with process name: [name]
main registers signal handlers for
main creates a
main turns spark.shuffle.service.enabled to
true explicitly (since this service is started from the command line for a reason).
main prints out the following DEBUG message to the logs:
Adding shutdown hook
main registers a shutdown hook. When triggered, the shutdown hook prints the following INFO message to the logs and requests the
ExternalShuffleService to stop.
Shutting down shuffle service.
ExternalShuffleService takes the following to be created:
ExternalShuffleService is created when:
ExternalShuffleServicestandalone application is started
Worker(Spark Standalone) is created (and initializes an
ExternalShuffleService uses an ExternalBlockHandler to handle RPC messages (and serve RDD blocks and shuffle blocks).
TransportServer is used for metrics.
With spark.shuffle.service.db.enabled and spark.shuffle.service.enabled configuration properties enabled, the
ExternalBlockHandler is given a local directory with a registeredExecutors.ldb file.
blockHandler is used when:
findRegisteredExecutorsDBFile( dbName: String): File
findRegisteredExecutorsDBFile returns one of the local directories (defined using spark.local.dir configuration property) with the input
dbName file or
null when no directories defined.
findRegisteredExecutorsDBFile searches the local directories (defined using spark.local.dir configuration property) for the input
dbName file. Unless found,
findRegisteredExecutorsDBFile takes the first local directory.
With no local directories defined in spark.local.dir configuration property,
findRegisteredExecutorsDBFile prints out the following WARN message to the logs and returns
'spark.local.dir' should be set first when we use db in ExternalShuffleService. Note that this only affects standalone mode.
start prints out the following INFO message to the logs:
Starting shuffle service on port [port] (auth enabled = [authEnabled])
start creates a
AuthServerBootstrap with authentication enabled (using SecurityManager).
start is used when:
ExternalShuffleServiceis requested to startIfEnabled and is launched (as a command-line application)
startIfEnabled is used when:
Worker(Spark Standalone) is requested to
Executor Removed Notification¶
executorRemoved( executorId: String, appId: String): Unit
executorRemoved is used when:
Worker(Spark Standalone) is requested to handleExecutorStateChanged
Application Finished Notification¶
applicationRemoved( appId: String): Unit
applicationRemoved is used when:
ALL logging level for
org.apache.spark.deploy.ExternalShuffleService logger to see what happens inside.
Add the following line to
Refer to Logging.