SparkSubmitCommandBuilder¶
SparkSubmitCommandBuilder is an AbstractCommandBuilder.
SparkSubmitCommandBuilder is used to build a command that spark-submit and SparkLauncher use to launch a Spark application.
SparkSubmitCommandBuilder uses the first argument to distinguish the shells:
pyspark-shell-mainsparkr-shell-mainrun-example
SparkSubmitCommandBuilder parses command-line arguments using OptionParser (which is a spark-submit-SparkSubmitOptionParser.md[SparkSubmitOptionParser]). OptionParser comes with the following methods:
-
handleto handle the known options (see the table below). It sets upmaster,deployMode,propertiesFile,conf,mainClass,sparkArgsinternal properties. -
handleUnknownto handle unrecognized options that usually lead toUnrecognized optionerror message. -
handleExtraArgsto handle extra arguments that are considered a Spark application's arguments.
Note
For spark-shell it assumes that the application arguments are after spark-submit's arguments.
pyspark-shell-main Application Resource¶
When bin/pyspark shell script (and bin\pyspark2.cmd) are launched, they use bin/spark-submit with pyspark-shell-main application resource as the first argument (followed by --name "PySparkShell" option among the others).
pyspark-shell-main is used when:
SparkSubmitCommandBuilderis created and then requested to build a command (buildPySparkShellCommand actually)
Building Command¶
AbstractCommandBuilder
List<String> buildCommand(
Map<String, String> env)
buildCommand is part of the AbstractCommandBuilder abstraction.
buildCommand branches off based on the application resource.
| Application Resource | Command Builder |
|---|---|
| pyspark-shell-main (but not isSpecialCommand) | buildPySparkShellCommand |
sparkr-shell-main (but not isSpecialCommand) | buildSparkRCommand |
| anything else | buildSparkSubmitCommand |
buildPySparkShellCommand¶
List<String> buildPySparkShellCommand(
Map<String, String> env)
appArgs expected to be empty
buildPySparkShellCommand makes sure that:
buildPySparkShellCommand sets the application resource as pyspark-shell.
pyspark-shell-main redefined to pyspark-shell
buildPySparkShellCommand is executed when requested for a command with pyspark-shell-main application resource that is re-defined (reset) to pyspark-shell now.
buildPySparkShellCommand constructEnvVarArgs with the given env and PYSPARK_SUBMIT_ARGS.
buildPySparkShellCommand defines an internal pyargs collection for the parts of the shell command to execute.
buildPySparkShellCommand stores the Python executable (in pyargs) to be the first specified in the following order:
spark.pyspark.driver.pythonconfiguration propertyspark.pyspark.pythonconfiguration propertyPYSPARK_DRIVER_PYTHONenvironment variablePYSPARK_PYTHONenvironment variablepython3
buildPySparkShellCommand sets the environment variables (for the Python executable to use), if specified.
| Environment Variable | Configuration Property |
|---|---|
PYSPARK_PYTHON | spark.pyspark.python |
SPARK_REMOTE | remote option or spark.remote |
In the end, buildPySparkShellCommand copies all the options from PYSPARK_DRIVER_PYTHON_OPTS, if specified.
buildSparkSubmitCommand¶
List<String> buildSparkSubmitCommand(
Map<String, String> env)
buildSparkSubmitCommand starts by building so-called effective config. When in client mode, buildSparkSubmitCommand adds spark.driver.extraClassPath to the result Spark command.
buildSparkSubmitCommand builds the first part of the Java command passing in the extra classpath (only for client deploy mode).
Add isThriftServer case
buildSparkSubmitCommand appends SPARK_SUBMIT_OPTS and SPARK_JAVA_OPTS environment variables.
(only for client deploy mode) ...
Elaborate on the client deply mode case
addPermGenSizeOpt case...elaborate
Elaborate on addPermGenSizeOpt
buildSparkSubmitCommand appends org.apache.spark.deploy.SparkSubmit and the command-line arguments (using buildSparkSubmitArgs).
buildSparkSubmitArgs¶
List<String> buildSparkSubmitArgs()
buildSparkSubmitArgs builds a list of command-line arguments for spark-submit.
buildSparkSubmitArgs uses a SparkSubmitOptionParser to add the command-line arguments that spark-submit recognizes (when it is executed later on and uses the very same SparkSubmitOptionParser parser to parse command-line arguments).
buildSparkSubmitArgs is used when:
InProcessLauncheris requested tostartApplicationSparkLauncheris requested to createBuilderSparkSubmitCommandBuilderis requested to buildSparkSubmitCommand and constructEnvVarArgs
SparkSubmitCommandBuilder Properties and SparkSubmitOptionParser Attributes¶
| SparkSubmitCommandBuilder Property | SparkSubmitOptionParser Attribute |
|---|---|
verbose | VERBOSE |
master | MASTER [master] |
deployMode | DEPLOY_MODE [deployMode] |
appName | NAME [appName] |
conf | CONF [key=value]* |
propertiesFile | PROPERTIES_FILE [propertiesFile] |
jars | JARS [comma-separated jars] |
files | FILES [comma-separated files] |
pyFiles | PY_FILES [comma-separated pyFiles] |
mainClass | CLASS [mainClass] |
sparkArgs | sparkArgs (passed straight through) |
appResource | appResource (passed straight through) |
appArgs | appArgs (passed straight through) |