Skip to content

SparkSubmitArguments

SparkSubmitArguments is created  for SparkSubmit to parseArguments.

SparkSubmitArguments is a custom SparkSubmitArgumentsParser to handle the command-line arguments of spark-submit script that the actions use for execution (possibly with the explicit env environment).

SparkSubmitArguments is created when launching spark-submit script with only args passed in and later used for printing the arguments in verbose mode.

Creating Instance

SparkSubmitArguments takes the following to be created:

  • Arguments (Seq[String])
  • Environment Variables (default: sys.env)

SparkSubmitArguments is created when:

Action

action: SparkSubmitAction

action is used by SparkSubmit to determine what to do when executed.

action can be one of the following SparkSubmitActions:

Action Description
SUBMIT The default action if none specified
KILL Indicates --kill switch
REQUEST_STATUS Indicates --status switch
PRINT_VERSION Indicates --version switch

action is undefined (null) by default (when SparkSubmitAction is created).

action is validated when validateArguments.

Command-Line Options

--files

  • Configuration Property: spark.files
  • Configuration Property (Spark on YARN): spark.yarn.dist.files

Printed out to standard output for --verbose option

When SparkSubmit is requested to prepareSubmitEnvironment, the files are:

Loading Spark Properties

loadEnvironmentArguments(): Unit

loadEnvironmentArguments loads the Spark properties for the current execution of spark-submit.

loadEnvironmentArguments reads command-line options first followed by Spark properties and System's environment variables.

Note

Spark config properties start with spark. prefix and can be set using --conf [key=value] command-line option.

Option Handling

SparkSubmitOptionParser
handle(
  opt: String,
  value: String): Boolean

handle is part of the SparkSubmitOptionParser abstraction.

handle parses the input opt argument and assigns the given value to corresponding properties.

In the end, handle returns whether it was executed for any action but PRINT_VERSION.

User Option (opt) Property
--kill action
--name name
--status action
--version action
... ...

mergeDefaultSparkProperties

mergeDefaultSparkProperties(): Unit

mergeDefaultSparkProperties merges Spark properties from the default Spark properties file, i.e. spark-defaults.conf with those specified through --conf command-line option.

isPython

isPython: Boolean = false

isPython indicates whether the application resource is a PySpark application (a Python script or pyspark shell).

isPython is isPython when SparkSubmitArguments is requested to handle a unknown option.

Client Deploy Mode

With isPython flag enabled, SparkSubmit determines the mainClass (and the childArgs) based on the primaryResource.

primaryResource mainClass
pyspark-shell org.apache.spark.api.python.PythonGatewayServer (PySpark)
anything else org.apache.spark.deploy.PythonRunner (PySpark)