Skip to content

SparkSubmitCommandBuilder

== [[SparkSubmitCommandBuilder]] SparkSubmitCommandBuilder Command Builder

SparkSubmitCommandBuilder is used to build a command that spark-submit.md#main[spark-submit] and spark-SparkLauncher.md[SparkLauncher] use to launch a Spark application.

SparkSubmitCommandBuilder uses the first argument to distinguish between shells:

  1. pyspark-shell-main
  2. sparkr-shell-main
  3. run-example

CAUTION: FIXME Describe run-example

SparkSubmitCommandBuilder parses command-line arguments using OptionParser (which is a spark-submit-SparkSubmitOptionParser.md[SparkSubmitOptionParser]). OptionParser comes with the following methods:

  1. handle to handle the known options (see the table below). It sets up master, deployMode, propertiesFile, conf, mainClass, sparkArgs internal properties.

  2. handleUnknown to handle unrecognized options that usually lead to Unrecognized option error message.

  3. handleExtraArgs to handle extra arguments that are considered a Spark application's arguments.

NOTE: For spark-shell it assumes that the application arguments are after spark-submit's arguments.

=== [[buildCommand]] SparkSubmitCommandBuilder.buildCommand / buildSparkSubmitCommand

[source, java]

public List buildCommand(Map env)

NOTE: buildCommand is part of the spark-AbstractCommandBuilder.md[AbstractCommandBuilder] public API.

SparkSubmitCommandBuilder.buildCommand simply passes calls on to <> private method (unless it was executed for pyspark or sparkr scripts which we are not interested in in this document).

==== [[buildSparkSubmitCommand]] buildSparkSubmitCommand Internal Method

[source, java]

private List buildSparkSubmitCommand(Map env)

buildSparkSubmitCommand starts by <>. When in <>, buildSparkSubmitCommand adds spark-driver.md#spark_driver_extraClassPath[spark.driver.extraClassPath] to the result Spark command.

NOTE: Use spark-submit to have spark-driver.md#spark_driver_extraClassPath[spark.driver.extraClassPath] in effect.

buildSparkSubmitCommand spark-AbstractCommandBuilder.md#buildJavaCommand[builds the first part of the Java command] passing in the extra classpath (only for client deploy mode).

CAUTION: FIXME Add isThriftServer case.

buildSparkSubmitCommand appends SPARK_SUBMIT_OPTS and SPARK_JAVA_OPTS environment variables.

(only for client deploy mode) ...

CAUTION: FIXME Elaborate on the client deply mode case.

addPermGenSizeOpt case...elaborate

CAUTION: FIXME Elaborate on addPermGenSizeOpt

buildSparkSubmitCommand appends org.apache.spark.deploy.SparkSubmit and the command-line arguments (using <>).

==== [[buildSparkSubmitArgs]] buildSparkSubmitArgs method

[source, java]

List buildSparkSubmitArgs()

buildSparkSubmitArgs builds a list of command-line arguments for spark-submit.md[spark-submit].

buildSparkSubmitArgs uses a spark-submit-SparkSubmitOptionParser.md[SparkSubmitOptionParser] to add the command-line arguments that spark-submit recognizes (when it is executed later on and uses the very same SparkSubmitOptionParser parser to parse command-line arguments).

.SparkSubmitCommandBuilder Properties and Corresponding SparkSubmitOptionParser Attributes [options="header",width="100%"] |=== | SparkSubmitCommandBuilder Property | SparkSubmitOptionParser Attribute | verbose | VERBOSE | master | MASTER [master] | deployMode | DEPLOY_MODE [deployMode] | appName | NAME [appName] | conf | CONF [key=value]* | propertiesFile | PROPERTIES_FILE [propertiesFile] | jars | JARS [comma-separated jars] | files | FILES [comma-separated files] | pyFiles | PY_FILES [comma-separated pyFiles] | mainClass | CLASS [mainClass] | sparkArgs | sparkArgs (passed straight through) | appResource | appResource (passed straight through) | appArgs | appArgs (passed straight through) |===

==== [[getEffectiveConfig]] getEffectiveConfig Internal Method

[source, java]

Map getEffectiveConfig()

getEffectiveConfig internal method builds effectiveConfig that is conf with the Spark properties file loaded (using spark-AbstractCommandBuilder.md#loadPropertiesFile[loadPropertiesFile] internal method) skipping keys that have already been loaded (it happened when the command-line options were parsed in <> method).

NOTE: Command-line options (e.g. --driver-class-path) have higher precedence than their corresponding Spark settings in a Spark properties file (e.g. spark.driver.extraClassPath). You can therefore control the final settings by overriding Spark settings on command line using the command-line options. charset and trims white spaces around values.

==== [[isClientMode]] isClientMode Internal Method

[source, java]

private boolean isClientMode(Map userProps)

isClientMode checks master first (from the command-line options) and then spark.master Spark property. Same with deployMode and spark.submit.deployMode.

CAUTION: FIXME Review master and deployMode. How are they set?

isClientMode responds positive when no explicit master and client deploy mode set explicitly.

=== [[OptionParser]] OptionParser

OptionParser is a custom spark-submit-SparkSubmitOptionParser.md[SparkSubmitOptionParser] that SparkSubmitCommandBuilder uses to parse command-line arguments. It defines all the spark-submit-SparkSubmitOptionParser.md#callbacks[SparkSubmitOptionParser callbacks], i.e. <>, <>, and <>, for command-line argument handling.

==== [[OptionParser-handle]] OptionParser's handle Callback

[source, scala]

boolean handle(String opt, String value)

OptionParser comes with a custom handle callback (from the spark-submit-SparkSubmitOptionParser.md#callbacks[SparkSubmitOptionParser callbacks]).

.handle Method [options="header",width="100%"] |=== | Command-Line Option | Property / Behaviour | --master | master | --deploy-mode | deployMode | --properties-file | propertiesFile | --driver-memory | Sets spark.driver.memory (in conf) | --driver-java-options | Sets spark.driver.extraJavaOptions (in conf) | --driver-library-path | Sets spark.driver.extraLibraryPath (in conf) | --driver-class-path | Sets spark.driver.extraClassPath (in conf) | --conf | Expects a key=value pair that it puts in conf | --class | Sets mainClass (in conf).

It may also set allowsMixedArguments and appResource if the execution is for one of the special classes, i.e. spark-shell.md[spark-shell], SparkSQLCLIDriver, or spark-sql-thrift-server.md[HiveThriftServer2]. | --kill | --status | Disables isAppResourceReq and adds itself with the value to sparkArgs. | --help | --usage-error | Disables isAppResourceReq and adds itself to sparkArgs. | --version | Disables isAppResourceReq and adds itself to sparkArgs. | anything else | Adds an element to sparkArgs |===

==== [[OptionParser-handleUnknown]] OptionParser's handleUnknown Method

[source, scala]

boolean handleUnknown(String opt)

If allowsMixedArguments is enabled, handleUnknown simply adds the input opt to appArgs and allows for further spark-submit-SparkSubmitOptionParser.md#parse[parsing of the argument list].

CAUTION: FIXME Where's allowsMixedArguments enabled?

If isExample is enabled, handleUnknown sets mainClass to be org.apache.spark.examples.[opt] (unless the input opt has already the package prefix) and stops further spark-submit-SparkSubmitOptionParser.md#parse[parsing of the argument list].

CAUTION: FIXME Where's isExample enabled?

Otherwise, handleUnknown sets appResource and stops further spark-submit-SparkSubmitOptionParser.md#parse[parsing of the argument list].

==== [[OptionParser-handleExtraArgs]] OptionParser's handleExtraArgs Method

[source, scala]

void handleExtraArgs(List extra)

handleExtraArgs adds all the extra arguments to appArgs.


Last update: 2020-10-06