Skip to content

SparkSubmit

SparkSubmit is the entry point to spark-submit shell script.

Special Primary Resource Names

SparkSubmit uses the following special primary resource names to represent Spark shells rather than application jars:

pyspark-shell

SparkSubmit uses pyspark-shell when:

isShell

isShell(
  res: String): Boolean

isShell is true when the given res primary resource represents a Spark shell.

isShell is used when:

Actions

SparkSubmit executes actions (based on the action argument).

Killing Submission

kill(
  args: SparkSubmitArguments): Unit

kill...FIXME

Displaying Version

printVersion(): Unit

printVersion...FIXME

Submission Status

requestStatus(
  args: SparkSubmitArguments): Unit

requestStatus...FIXME

Submission

submit(
  args: SparkSubmitArguments,
  uninitLog: Boolean): Unit

submit...FIXME

Running Main Class

runMain(
  args: SparkSubmitArguments,
  uninitLog: Boolean): Unit

runMain prepareSubmitEnvironment with the given SparkSubmitArguments (that gives a 4-element tuple of childArgs, childClasspath, sparkConf and childMainClass).

With verbose enabled, runMain prints out the following INFO messages to the logs:

Main class:
[childMainClass]
Arguments:
[childArgs]
Spark config:
[sparkConf_redacted]
Classpath elements:
[childClasspath]

runMain creates and sets a context classloader (based on spark.driver.userClassPathFirst configuration property) and adds the jars (from childClasspath).

runMain loads the main class (childMainClass).

runMain creates a SparkApplication (if the main class is a subtype of) or creates a JavaMainApplication (with the main class).

In the end, runMain requests the SparkApplication to start (with the childArgs and sparkConf).

Cluster Managers

SparkSubmit has a built-in support for some cluster managers (that are selected based on the master argument).

Nickname Master URL
KUBERNETES k8s://-prefix
LOCAL local-prefix
MESOS mesos-prefix
STANDALONE spark-prefix
YARN yarn

Launching Standalone Application

main(
  args: Array[String]): Unit

main...FIXME

doSubmit

doSubmit(
  args: Array[String]): Unit

doSubmit...FIXME

doSubmit is used when:

  • InProcessSparkSubmit standalone application is started
  • SparkSubmit standalone application is started

prepareSubmitEnvironment

prepareSubmitEnvironment(
  args: SparkSubmitArguments,
  conf: Option[HadoopConfiguration] = None): (Seq[String], Seq[String], SparkConf, String)

prepareSubmitEnvironment creates a 4-element tuple made up of the following:

  1. childArgs for arguments
  2. childClasspath for Classpath elements
  3. sysProps for Spark properties
  4. childMainClass

Tip

Use --verbose command-line option to have the elements of the tuple printed out to the standard output.

prepareSubmitEnvironment...FIXME

For isPython in CLIENT deploy mode, prepareSubmitEnvironment sets the following based on primaryResource:

  • For pyspark-shell the mainClass is org.apache.spark.api.python.PythonGatewayServer

  • Otherwise, the mainClass is org.apache.spark.deploy.PythonRunner and the main python file, extra python files and the childArgs

prepareSubmitEnvironment...FIXME

prepareSubmitEnvironment determines the cluster manager based on master argument.

For KUBERNETES, prepareSubmitEnvironment checkAndGetK8sMasterUrl.

prepareSubmitEnvironment...FIXME

prepareSubmitEnvironment is used when...FIXME

childMainClass

childMainClass is the last 4th argument in the result tuple of prepareSubmitEnvironment.

// (childArgs, childClasspath, sparkConf, childMainClass)
(Seq[String], Seq[String], SparkConf, String)

childMainClass can be as follows:

Deploy Mode Master URL childMainClass
client any mainClass
cluster KUBERNETES KubernetesClientApplication
cluster MESOS RestSubmissionClientApp (for REST submission API)
cluster STANDALONE RestSubmissionClientApp (for REST submission API)
cluster STANDALONE ClientApp
cluster YARN YarnClusterApplication

isKubernetesClient

prepareSubmitEnvironment uses isKubernetesClient flag to indicate that:

isKubernetesClusterModeDriver

prepareSubmitEnvironment uses isKubernetesClusterModeDriver flag to indicate that:

renameResourcesToLocalFS

renameResourcesToLocalFS(
  resources: String,
  localResources: String): String

renameResourcesToLocalFS...FIXME

renameResourcesToLocalFS is used for isKubernetesClusterModeDriver mode.

downloadResource

downloadResource(
  resource: String): String

downloadResource...FIXME

Checking Whether Resource is Internal

isInternal(
  res: String): Boolean

isInternal is true when the given res is spark-internal.

isInternal is used when:

isUserJar

isUserJar(
  res: String): Boolean

isUserJar is true when the given res is none of the following:

isUserJar is used when:

  • FIXME

isPython Utility

isPython(
  res: String): Boolean

isPython is true when the given res primary resource represents a PySpark application:

isPython is used when:

  • SparkSubmit is requested to isUserJar
  • SparkSubmitArguments is requested to handleUnknown (and set isPython internal flag)

Last update: 2021-02-21