SparkSubmit¶
SparkSubmit is the entry point to spark-submit shell script.
Special Primary Resource Names¶
SparkSubmit uses the following special primary resource names to represent Spark shells rather than application jars:
spark-shell- pyspark-shell
sparkr-shell
pyspark-shell¶
SparkSubmit uses pyspark-shell when:
SparkSubmitis requested to prepareSubmitEnvironment for.pyscripts orpyspark, isShell and isPython
isShell¶
isShell(
res: String): Boolean
isShell is true when the given res primary resource represents a Spark shell.
isShell is used when:
SparkSubmitis requested to prepareSubmitEnvironment and isUserJarSparkSubmitArgumentsis requested to handleUnknown (and determine a primary application resource)
Actions¶
SparkSubmit executes actions (based on the action argument).
Killing Submission¶
kill(
args: SparkSubmitArguments): Unit
kill...FIXME
Displaying Version¶
printVersion(): Unit
printVersion...FIXME
Submission Status¶
requestStatus(
args: SparkSubmitArguments): Unit
requestStatus...FIXME
Application Submission¶
submit(
args: SparkSubmitArguments,
uninitLog: Boolean): Unit
submit doRunMain unless isStandaloneCluster and useRest.
For isStandaloneCluster with useRest requested, submit...FIXME
doRunMain¶
doRunMain(): Unit
doRunMain runMain unless proxyUser is specified.
With proxyUser specified, doRunMain...FIXME
Running Main Class¶
runMain(
args: SparkSubmitArguments,
uninitLog: Boolean): Unit
runMain prepares submit environment for the given SparkSubmitArguments (that gives childArgs, childClasspath, sparkConf and childMainClass).
With verbose enabled, runMain prints out the following INFO messages to the logs:
Main class:
[childMainClass]
Arguments:
[childArgs]
Spark config:
[sparkConf_redacted]
Classpath elements:
[childClasspath]
runMain creates and sets a context classloader (based on spark.driver.userClassPathFirst configuration property) and adds the jars (from childClasspath).
runMain loads the main class (childMainClass).
runMain creates a SparkApplication (if the main class is a subtype of) or creates a JavaMainApplication (with the main class).
In the end, runMain requests the SparkApplication to start (with the childArgs and sparkConf).
Cluster Managers¶
SparkSubmit has a built-in support for some cluster managers (that are selected based on the master argument).
| Nickname | Master URL |
|---|---|
| KUBERNETES | k8s://-prefix |
| LOCAL | local-prefix |
| MESOS | mesos-prefix |
| STANDALONE | spark-prefix |
| YARN | yarn |
Launching Standalone Application¶
main(
args: Array[String]): Unit
main creates a SparkSubmit to doSubmit (with the given args).
doSubmit¶
doSubmit(
args: Array[String]): Unit
doSubmit initializeLogIfNecessary.
doSubmit parses the arguments in the given args (that gives a SparkSubmitArguments).
With verbose option on, doSubmit prints out the appArgs to standard output.
doSubmit branches off based on action.
| Action | Handler |
|---|---|
SUBMIT | submit |
KILL | kill |
REQUEST_STATUS | requestStatus |
PRINT_VERSION | printVersion |
doSubmit is used when:
InProcessSparkSubmitstandalone application is startedSparkSubmitstandalone application is started
Parsing Arguments¶
parseArguments(
args: Array[String]): SparkSubmitArguments
parseArguments creates a SparkSubmitArguments (with the given args).
prepareSubmitEnvironment¶
prepareSubmitEnvironment(
args: SparkSubmitArguments,
conf: Option[HadoopConfiguration] = None): (Seq[String], Seq[String], SparkConf, String)
prepareSubmitEnvironment creates a 4-element tuple made up of the following:
childArgsfor argumentschildClasspathfor Classpath elementssysPropsfor Spark properties- childMainClass
Tip
Use --verbose command-line option to have the elements of the tuple printed out to the standard output.
prepareSubmitEnvironment...FIXME
For isPython in CLIENT deploy mode, prepareSubmitEnvironment sets the following based on primaryResource:
-
For pyspark-shell the mainClass is
org.apache.spark.api.python.PythonGatewayServer -
Otherwise, the mainClass is
org.apache.spark.deploy.PythonRunnerand the main python file, extra python files and the childArgs
prepareSubmitEnvironment...FIXME
prepareSubmitEnvironment determines the cluster manager based on master argument.
For KUBERNETES, prepareSubmitEnvironment checkAndGetK8sMasterUrl.
prepareSubmitEnvironment...FIXME
prepareSubmitEnvironment is used when...FIXME
childMainClass¶
childMainClass is the last 4th argument in the result tuple of prepareSubmitEnvironment.
// (childArgs, childClasspath, sparkConf, childMainClass)
(Seq[String], Seq[String], SparkConf, String)
childMainClass can be as follows (based on the deployMode):
| Deploy Mode | Master URL | childMainClass |
|---|---|---|
client | any | mainClass |
cluster | KUBERNETES | KubernetesClientApplication |
cluster | MESOS | RestSubmissionClientApp (for REST submission API) |
cluster | STANDALONE | RestSubmissionClientApp (for REST submission API) |
cluster | STANDALONE | ClientApp |
cluster | YARN | YarnClusterApplication |
isKubernetesClient¶
prepareSubmitEnvironment uses isKubernetesClient flag to indicate that:
isKubernetesClusterModeDriver¶
prepareSubmitEnvironment uses isKubernetesClusterModeDriver flag to indicate that:
- isKubernetesClient
spark.kubernetes.submitInDriverconfiguration property is enabled (Spark on Kubernetes)
renameResourcesToLocalFS¶
renameResourcesToLocalFS(
resources: String,
localResources: String): String
renameResourcesToLocalFS...FIXME
renameResourcesToLocalFS is used for isKubernetesClusterModeDriver mode.
downloadResource¶
downloadResource(
resource: String): String
downloadResource...FIXME
Checking Whether Resource is Internal¶
isInternal(
res: String): Boolean
isInternal is true when the given res is spark-internal.
isInternal is used when:
SparkSubmitis requested to isUserJarSparkSubmitArgumentsis requested to handleUnknown
isUserJar¶
isUserJar(
res: String): Boolean
isUserJar is true when the given res is none of the following:
isShell- isPython
- isInternal
isR
isUserJar is used when:
- FIXME
isPython¶
isPython(
res: String): Boolean
isPython is positive (true) when the given res primary resource represents a PySpark application:
.pyscript- pyspark-shell
isPython is used when:
SparkSubmitis requested to isUserJarSparkSubmitArgumentsis requested to handle an unknown option