SparkSubmitArguments¶
SparkSubmitArguments is created for SparkSubmit to parseArguments.
SparkSubmitArguments is a custom SparkSubmitArgumentsParser to handle the command-line arguments of spark-submit script that the actions use for execution (possibly with the explicit env environment).
SparkSubmitArguments is created when launching spark-submit script with only args passed in and later used for printing the arguments in verbose mode.
Creating Instance¶
SparkSubmitArguments takes the following to be created:
- Arguments (
Seq[String]) - Environment Variables (default:
sys.env)
SparkSubmitArguments is created when:
SparkSubmitis requested to parseArguments
Action¶
action: SparkSubmitAction
action is used by SparkSubmit to determine what to do when executed.
action can be one of the following SparkSubmitActions:
| Action | Description |
|---|---|
SUBMIT | The default action if none specified |
KILL | Indicates --kill switch |
REQUEST_STATUS | Indicates --status switch |
PRINT_VERSION | Indicates --version switch |
action is undefined (null) by default (when SparkSubmitAction is created).
action is validated when validateArguments.
Command-Line Options¶
--files¶
- Configuration Property: spark.files
- Configuration Property (Spark on YARN):
spark.yarn.dist.files
Printed out to standard output for --verbose option
When SparkSubmit is requested to prepareSubmitEnvironment, the files are:
Loading Spark Properties¶
loadEnvironmentArguments(): Unit
loadEnvironmentArguments loads the Spark properties for the current execution of spark-submit.
loadEnvironmentArguments reads command-line options first followed by Spark properties and System's environment variables.
Note
Spark config properties start with spark. prefix and can be set using --conf [key=value] command-line option.
Option Handling¶
SparkSubmitOptionParser
handle(
opt: String,
value: String): Boolean
handle is part of the SparkSubmitOptionParser abstraction.
handle parses the input opt argument and assigns the given value to corresponding properties.
In the end, handle returns whether it was executed for any action but PRINT_VERSION.
User Option (opt) | Property |
|---|---|
--kill | action |
--name | name |
--status | action |
--version | action |
| ... | ... |
mergeDefaultSparkProperties¶
mergeDefaultSparkProperties(): Unit
mergeDefaultSparkProperties merges Spark properties from the default Spark properties file, i.e. spark-defaults.conf with those specified through --conf command-line option.
isPython¶
isPython: Boolean = false
isPython indicates whether the application resource is a PySpark application (a Python script or pyspark shell).
isPython is isPython when SparkSubmitArguments is requested to handle a unknown option.
Client Deploy Mode¶
With isPython flag enabled, SparkSubmit determines the mainClass (and the childArgs) based on the primaryResource.
| primaryResource | mainClass |
|---|---|
pyspark-shell | org.apache.spark.api.python.PythonGatewayServer (PySpark) |
| anything else | org.apache.spark.deploy.PythonRunner (PySpark) |