SparkSubmit¶
SparkSubmit
is the entry point to spark-submit shell script.
Special Primary Resource Names¶
SparkSubmit
uses the following special primary resource names to represent Spark shells rather than application jars:
spark-shell
- pyspark-shell
sparkr-shell
pyspark-shell¶
SparkSubmit
uses pyspark-shell
when:
SparkSubmit
is requested to prepareSubmitEnvironment for.py
scripts orpyspark
, isShell and isPython
isShell¶
isShell(
res: String): Boolean
isShell
is true
when the given res
primary resource represents a Spark shell.
isShell
is used when:
SparkSubmit
is requested to prepareSubmitEnvironment and isUserJarSparkSubmitArguments
is requested to handleUnknown (and determine a primary application resource)
Actions¶
SparkSubmit
executes actions (based on the action argument).
Killing Submission¶
kill(
args: SparkSubmitArguments): Unit
kill
...FIXME
Displaying Version¶
printVersion(): Unit
printVersion
...FIXME
Submission Status¶
requestStatus(
args: SparkSubmitArguments): Unit
requestStatus
...FIXME
Application Submission¶
submit(
args: SparkSubmitArguments,
uninitLog: Boolean): Unit
submit
doRunMain unless isStandaloneCluster and useRest.
For isStandaloneCluster with useRest requested, submit
...FIXME
doRunMain¶
doRunMain(): Unit
doRunMain
runMain unless proxyUser is specified.
With proxyUser specified, doRunMain
...FIXME
Running Main Class¶
runMain(
args: SparkSubmitArguments,
uninitLog: Boolean): Unit
runMain
prepares submit environment for the given SparkSubmitArguments (that gives childArgs
, childClasspath
, sparkConf
and childMainClass).
With verbose enabled, runMain
prints out the following INFO messages to the logs:
Main class:
[childMainClass]
Arguments:
[childArgs]
Spark config:
[sparkConf_redacted]
Classpath elements:
[childClasspath]
runMain
creates and sets a context classloader (based on spark.driver.userClassPathFirst
configuration property) and adds the jars (from childClasspath
).
runMain
loads the main class (childMainClass
).
runMain
creates a SparkApplication (if the main class is a subtype of) or creates a JavaMainApplication (with the main class).
In the end, runMain
requests the SparkApplication
to start (with the childArgs
and sparkConf
).
Cluster Managers¶
SparkSubmit
has a built-in support for some cluster managers (that are selected based on the master argument).
Nickname | Master URL |
---|---|
KUBERNETES | k8s:// -prefix |
LOCAL | local -prefix |
MESOS | mesos -prefix |
STANDALONE | spark -prefix |
YARN | yarn |
Launching Standalone Application¶
main(
args: Array[String]): Unit
main
creates a SparkSubmit
to doSubmit (with the given args
).
doSubmit¶
doSubmit(
args: Array[String]): Unit
doSubmit
initializeLogIfNecessary.
doSubmit
parses the arguments in the given args
(that gives a SparkSubmitArguments).
With verbose option on, doSubmit
prints out the appArgs
to standard output.
doSubmit
branches off based on action.
Action | Handler |
---|---|
SUBMIT | submit |
KILL | kill |
REQUEST_STATUS | requestStatus |
PRINT_VERSION | printVersion |
doSubmit
is used when:
InProcessSparkSubmit
standalone application is startedSparkSubmit
standalone application is started
Parsing Arguments¶
parseArguments(
args: Array[String]): SparkSubmitArguments
parseArguments
creates a SparkSubmitArguments (with the given args
).
prepareSubmitEnvironment¶
prepareSubmitEnvironment(
args: SparkSubmitArguments,
conf: Option[HadoopConfiguration] = None): (Seq[String], Seq[String], SparkConf, String)
prepareSubmitEnvironment
creates a 4-element tuple made up of the following:
childArgs
for argumentschildClasspath
for Classpath elementssysProps
for Spark properties- childMainClass
Tip
Use --verbose
command-line option to have the elements of the tuple printed out to the standard output.
prepareSubmitEnvironment
...FIXME
For isPython in CLIENT
deploy mode, prepareSubmitEnvironment
sets the following based on primaryResource:
-
For pyspark-shell the mainClass is
org.apache.spark.api.python.PythonGatewayServer
-
Otherwise, the mainClass is
org.apache.spark.deploy.PythonRunner
and the main python file, extra python files and the childArgs
prepareSubmitEnvironment
...FIXME
prepareSubmitEnvironment
determines the cluster manager based on master argument.
For KUBERNETES, prepareSubmitEnvironment
checkAndGetK8sMasterUrl.
prepareSubmitEnvironment
...FIXME
prepareSubmitEnvironment
is used when...FIXME
childMainClass¶
childMainClass
is the last 4th argument in the result tuple of prepareSubmitEnvironment.
// (childArgs, childClasspath, sparkConf, childMainClass)
(Seq[String], Seq[String], SparkConf, String)
childMainClass
can be as follows (based on the deployMode):
Deploy Mode | Master URL | childMainClass |
---|---|---|
client | any | mainClass |
cluster | KUBERNETES | KubernetesClientApplication |
cluster | MESOS | RestSubmissionClientApp (for REST submission API) |
cluster | STANDALONE | RestSubmissionClientApp (for REST submission API) |
cluster | STANDALONE | ClientApp |
cluster | YARN | YarnClusterApplication |
isKubernetesClient¶
prepareSubmitEnvironment
uses isKubernetesClient
flag to indicate that:
isKubernetesClusterModeDriver¶
prepareSubmitEnvironment
uses isKubernetesClusterModeDriver
flag to indicate that:
- isKubernetesClient
spark.kubernetes.submitInDriver
configuration property is enabled (Spark on Kubernetes)
renameResourcesToLocalFS¶
renameResourcesToLocalFS(
resources: String,
localResources: String): String
renameResourcesToLocalFS
...FIXME
renameResourcesToLocalFS
is used for isKubernetesClusterModeDriver mode.
downloadResource¶
downloadResource(
resource: String): String
downloadResource
...FIXME
Checking Whether Resource is Internal¶
isInternal(
res: String): Boolean
isInternal
is true
when the given res
is spark-internal.
isInternal
is used when:
SparkSubmit
is requested to isUserJarSparkSubmitArguments
is requested to handleUnknown
isUserJar¶
isUserJar(
res: String): Boolean
isUserJar
is true
when the given res
is none of the following:
isShell
- isPython
- isInternal
isR
isUserJar
is used when:
- FIXME
isPython¶
isPython(
res: String): Boolean
isPython
is positive (true
) when the given res
primary resource represents a PySpark application:
.py
script- pyspark-shell
isPython
is used when:
SparkSubmit
is requested to isUserJarSparkSubmitArguments
is requested to handle an unknown option