Skip to content

SparkSubmitCommandBuilder

SparkSubmitCommandBuilder is an AbstractCommandBuilder.

SparkSubmitCommandBuilder is used to build a command that spark-submit and SparkLauncher use to launch a Spark application.

SparkSubmitCommandBuilder uses the first argument to distinguish the shells:

  1. pyspark-shell-main
  2. sparkr-shell-main
  3. run-example

SparkSubmitCommandBuilder parses command-line arguments using OptionParser (which is a spark-submit-SparkSubmitOptionParser.md[SparkSubmitOptionParser]). OptionParser comes with the following methods:

  1. handle to handle the known options (see the table below). It sets up master, deployMode, propertiesFile, conf, mainClass, sparkArgs internal properties.

  2. handleUnknown to handle unrecognized options that usually lead to Unrecognized option error message.

  3. handleExtraArgs to handle extra arguments that are considered a Spark application's arguments.

Note

For spark-shell it assumes that the application arguments are after spark-submit's arguments.

pyspark-shell-main Application Resource

When bin/pyspark shell script (and bin\pyspark2.cmd) are launched, they use bin/spark-submit with pyspark-shell-main application resource as the first argument (followed by --name "PySparkShell" option among the others).

pyspark-shell-main is used when:

Building Command

AbstractCommandBuilder
List<String> buildCommand(
  Map<String, String> env)

buildCommand is part of the AbstractCommandBuilder abstraction.

buildCommand branches off based on the application resource.

Application Resource Command Builder
pyspark-shell-main (but not isSpecialCommand) buildPySparkShellCommand
sparkr-shell-main (but not isSpecialCommand) buildSparkRCommand
anything else buildSparkSubmitCommand

buildPySparkShellCommand

List<String> buildPySparkShellCommand(
  Map<String, String> env)
appArgs expected to be empty

buildPySparkShellCommand makes sure that:

  • There are no appArgs
  • If there are appArgs the first argument is not a Python script (a file with .py extension)

buildPySparkShellCommand sets the application resource as pyspark-shell.

pyspark-shell-main redefined to pyspark-shell

buildPySparkShellCommand is executed when requested for a command with pyspark-shell-main application resource that is re-defined (reset) to pyspark-shell now.

buildPySparkShellCommand constructEnvVarArgs with the given env and PYSPARK_SUBMIT_ARGS.

buildPySparkShellCommand defines an internal pyargs collection for the parts of the shell command to execute.

buildPySparkShellCommand stores the Python executable (in pyargs) to be the first specified in the following order:

  • spark.pyspark.driver.python configuration property
  • spark.pyspark.python configuration property
  • PYSPARK_DRIVER_PYTHON environment variable
  • PYSPARK_PYTHON environment variable
  • python3

buildPySparkShellCommand sets the environment variables (for the Python executable to use), if specified.

Environment Variable Configuration Property
PYSPARK_PYTHON spark.pyspark.python
SPARK_REMOTE remote option or spark.remote

In the end, buildPySparkShellCommand copies all the options from PYSPARK_DRIVER_PYTHON_OPTS, if specified.

buildSparkSubmitCommand

List<String> buildSparkSubmitCommand(
  Map<String, String> env)

buildSparkSubmitCommand starts by building so-called effective config. When in client mode, buildSparkSubmitCommand adds spark.driver.extraClassPath to the result Spark command.

buildSparkSubmitCommand builds the first part of the Java command passing in the extra classpath (only for client deploy mode).

Add isThriftServer case

buildSparkSubmitCommand appends SPARK_SUBMIT_OPTS and SPARK_JAVA_OPTS environment variables.

(only for client deploy mode) ...

Elaborate on the client deply mode case

addPermGenSizeOpt case...elaborate

Elaborate on addPermGenSizeOpt

buildSparkSubmitCommand appends org.apache.spark.deploy.SparkSubmit and the command-line arguments (using buildSparkSubmitArgs).

buildSparkSubmitArgs

List<String> buildSparkSubmitArgs()

buildSparkSubmitArgs builds a list of command-line arguments for spark-submit.

buildSparkSubmitArgs uses a SparkSubmitOptionParser to add the command-line arguments that spark-submit recognizes (when it is executed later on and uses the very same SparkSubmitOptionParser parser to parse command-line arguments).

buildSparkSubmitArgs is used when:

SparkSubmitCommandBuilder Properties and SparkSubmitOptionParser Attributes

SparkSubmitCommandBuilder Property SparkSubmitOptionParser Attribute
verbose VERBOSE
master MASTER [master]
deployMode DEPLOY_MODE [deployMode]
appName NAME [appName]
conf CONF [key=value]*
propertiesFile PROPERTIES_FILE [propertiesFile]
jars JARS [comma-separated jars]
files FILES [comma-separated files]
pyFiles PY_FILES [comma-separated pyFiles]
mainClass CLASS [mainClass]
sparkArgs sparkArgs (passed straight through)
appResource appResource (passed straight through)
appArgs appArgs (passed straight through)