SparkSubmitArguments¶
SparkSubmitArguments is created for SparkSubmit to parseArguments.
SparkSubmitArguments is a custom SparkSubmitArgumentsParser to handle the command-line arguments of spark-submit script that the actions use for execution (possibly with the explicit env environment).
SparkSubmitArguments is created when launching spark-submit script with only args passed in and later used for printing the arguments in verbose mode.
Creating Instance¶
SparkSubmitArguments takes the following to be created:
- Arguments (
Seq[String]) - Environment Variables (default:
sys.env)
SparkSubmitArguments is created when:
SparkSubmitis requested to parseArguments
Action¶
action is used by SparkSubmit to determine what to do when executed.
action can be one of the following SparkSubmitActions:
| Action | Description |
|---|---|
SUBMIT | The default action if none specified |
KILL | Indicates --kill switch |
REQUEST_STATUS | Indicates --status switch |
PRINT_VERSION | Indicates --version switch |
action is undefined (null) by default (when SparkSubmitAction is created).
action is validated when validateArguments.
Command-Line Options¶
--files¶
- Configuration Property: spark.files
- Configuration Property (Spark on YARN):
spark.yarn.dist.files
Printed out to standard output for --verbose option
When SparkSubmit is requested to prepareSubmitEnvironment, the files are:
Loading Spark Properties¶
loadEnvironmentArguments loads the Spark properties for the current execution of spark-submit.
loadEnvironmentArguments reads command-line options first followed by Spark properties and System's environment variables.
Note
Spark config properties start with spark. prefix and can be set using --conf [key=value] command-line option.
Option Handling¶
SparkSubmitOptionParser
handle is part of the SparkSubmitOptionParser abstraction.
handle parses the input opt argument and assigns the given value to corresponding properties.
In the end, handle returns whether it was executed for any action but PRINT_VERSION.
User Option (opt) | Property |
|---|---|
--kill | action |
--name | name |
| --remote | maybeRemote |
--status | action |
--version | action |
| ... | ... |
maybeRemote¶
maybeRemote is the value of the following (in this order of precedence):
- --remote command-line option
spark.remoteconfiguration propertySPARK_REMOTEenvironment variable
maybeRemote must not be used alongside master or deploy mode. They are exclusive.
If maybeRemote is not specified, maybeMaster is used.
maybeRemote can be displayed when SparkSubmitArguments is requested to toString.
mergeDefaultSparkProperties¶
mergeDefaultSparkProperties merges Spark properties from the default Spark properties file, i.e. spark-defaults.conf with those specified through --conf command-line option.
isPython¶
isPython indicates whether the application resource is a PySpark application (a Python script or pyspark shell).
isPython is isPython when SparkSubmitArguments is requested to handle a unknown option.
Client Deploy Mode¶
With isPython flag enabled, SparkSubmit determines the mainClass (and the childArgs) based on the primaryResource.
| primaryResource | mainClass |
|---|---|
pyspark-shell | org.apache.spark.api.python.PythonGatewayServer (PySpark) |
| anything else | org.apache.spark.deploy.PythonRunner (PySpark) |