SparkSubmitArguments¶
SparkSubmitArguments
is created for SparkSubmit
to parseArguments.
SparkSubmitArguments
is a custom SparkSubmitArgumentsParser
to handle the command-line arguments of spark-submit script that the actions use for execution (possibly with the explicit env
environment).
SparkSubmitArguments
is created when launching spark-submit script with only args
passed in and later used for printing the arguments in verbose mode.
Creating Instance¶
SparkSubmitArguments
takes the following to be created:
- Arguments (
Seq[String]
) - Environment Variables (default:
sys.env
)
SparkSubmitArguments
is created when:
SparkSubmit
is requested to parseArguments
Action¶
action: SparkSubmitAction
action
is used by SparkSubmit to determine what to do when executed.
action
can be one of the following SparkSubmitAction
s:
Action | Description |
---|---|
SUBMIT | The default action if none specified |
KILL | Indicates --kill switch |
REQUEST_STATUS | Indicates --status switch |
PRINT_VERSION | Indicates --version switch |
action
is undefined (null
) by default (when SparkSubmitAction
is created).
action
is validated when validateArguments.
Command-Line Options¶
--files¶
- Configuration Property: spark.files
- Configuration Property (Spark on YARN):
spark.yarn.dist.files
Printed out to standard output for --verbose
option
When SparkSubmit
is requested to prepareSubmitEnvironment, the files are:
Loading Spark Properties¶
loadEnvironmentArguments(): Unit
loadEnvironmentArguments
loads the Spark properties for the current execution of spark-submit.
loadEnvironmentArguments
reads command-line options first followed by Spark properties and System's environment variables.
Note
Spark config properties start with spark.
prefix and can be set using --conf [key=value]
command-line option.
Option Handling¶
SparkSubmitOptionParser
handle(
opt: String,
value: String): Boolean
handle
is part of the SparkSubmitOptionParser abstraction.
handle
parses the input opt
argument and assigns the given value
to corresponding properties.
In the end, handle
returns whether it was executed for any action but PRINT_VERSION.
User Option (opt ) | Property |
---|---|
--kill | action |
--name | name |
--status | action |
--version | action |
... | ... |
mergeDefaultSparkProperties¶
mergeDefaultSparkProperties(): Unit
mergeDefaultSparkProperties
merges Spark properties from the default Spark properties file, i.e. spark-defaults.conf
with those specified through --conf
command-line option.
isPython¶
isPython: Boolean = false
isPython
indicates whether the application resource is a PySpark application (a Python script or pyspark shell).
isPython
is isPython when SparkSubmitArguments
is requested to handle a unknown option.
Client Deploy Mode¶
With isPython flag enabled, SparkSubmit determines the mainClass (and the childArgs) based on the primaryResource.
primaryResource | mainClass |
---|---|
pyspark-shell | org.apache.spark.api.python.PythonGatewayServer (PySpark) |
anything else | org.apache.spark.deploy.PythonRunner (PySpark) |