SparkSubmitCommandBuilder¶
SparkSubmitCommandBuilder
is an AbstractCommandBuilder.
SparkSubmitCommandBuilder
is used to build a command that spark-submit and SparkLauncher use to launch a Spark application.
SparkSubmitCommandBuilder
uses the first argument to distinguish the shells:
pyspark-shell-main
sparkr-shell-main
run-example
SparkSubmitCommandBuilder
parses command-line arguments using OptionParser
(which is a spark-submit-SparkSubmitOptionParser.md[SparkSubmitOptionParser]). OptionParser
comes with the following methods:
-
handle
to handle the known options (see the table below). It sets upmaster
,deployMode
,propertiesFile
,conf
,mainClass
,sparkArgs
internal properties. -
handleUnknown
to handle unrecognized options that usually lead toUnrecognized option
error message. -
handleExtraArgs
to handle extra arguments that are considered a Spark application's arguments.
Note
For spark-shell
it assumes that the application arguments are after spark-submit
's arguments.
pyspark-shell-main Application Resource¶
When bin/pyspark
shell script (and bin\pyspark2.cmd
) are launched, they use bin/spark-submit with pyspark-shell-main
application resource as the first argument (followed by --name "PySparkShell"
option among the others).
pyspark-shell-main
is used when:
SparkSubmitCommandBuilder
is created and then requested to build a command (buildPySparkShellCommand actually)
Building Command¶
AbstractCommandBuilder
List<String> buildCommand(
Map<String, String> env)
buildCommand
is part of the AbstractCommandBuilder abstraction.
buildCommand
branches off based on the application resource.
Application Resource | Command Builder |
---|---|
pyspark-shell-main (but not isSpecialCommand) | buildPySparkShellCommand |
sparkr-shell-main (but not isSpecialCommand) | buildSparkRCommand |
anything else | buildSparkSubmitCommand |
buildPySparkShellCommand¶
List<String> buildPySparkShellCommand(
Map<String, String> env)
appArgs expected to be empty
buildPySparkShellCommand
makes sure that:
buildPySparkShellCommand
sets the application resource as pyspark-shell
.
pyspark-shell-main redefined to pyspark-shell
buildPySparkShellCommand
is executed when requested for a command with pyspark-shell-main
application resource that is re-defined (reset) to pyspark-shell
now.
buildPySparkShellCommand
constructEnvVarArgs with the given env
and PYSPARK_SUBMIT_ARGS
.
buildPySparkShellCommand
defines an internal pyargs
collection for the parts of the shell command to execute.
buildPySparkShellCommand
stores the Python executable (in pyargs
) to be the first specified in the following order:
spark.pyspark.driver.python
configuration propertyspark.pyspark.python
configuration propertyPYSPARK_DRIVER_PYTHON
environment variablePYSPARK_PYTHON
environment variablepython3
buildPySparkShellCommand
sets the environment variables (for the Python executable to use), if specified.
Environment Variable | Configuration Property |
---|---|
PYSPARK_PYTHON | spark.pyspark.python |
SPARK_REMOTE | remote option or spark.remote |
In the end, buildPySparkShellCommand
copies all the options from PYSPARK_DRIVER_PYTHON_OPTS
, if specified.
buildSparkSubmitCommand¶
List<String> buildSparkSubmitCommand(
Map<String, String> env)
buildSparkSubmitCommand
starts by building so-called effective config. When in client mode, buildSparkSubmitCommand
adds spark.driver.extraClassPath to the result Spark command.
buildSparkSubmitCommand
builds the first part of the Java command passing in the extra classpath (only for client
deploy mode).
Add isThriftServer
case
buildSparkSubmitCommand
appends SPARK_SUBMIT_OPTS
and SPARK_JAVA_OPTS
environment variables.
(only for client
deploy mode) ...
Elaborate on the client deply mode case
addPermGenSizeOpt
case...elaborate
Elaborate on addPermGenSizeOpt
buildSparkSubmitCommand
appends org.apache.spark.deploy.SparkSubmit
and the command-line arguments (using buildSparkSubmitArgs).
buildSparkSubmitArgs¶
List<String> buildSparkSubmitArgs()
buildSparkSubmitArgs
builds a list of command-line arguments for spark-submit.
buildSparkSubmitArgs
uses a SparkSubmitOptionParser to add the command-line arguments that spark-submit
recognizes (when it is executed later on and uses the very same SparkSubmitOptionParser
parser to parse command-line arguments).
buildSparkSubmitArgs
is used when:
InProcessLauncher
is requested tostartApplication
SparkLauncher
is requested to createBuilderSparkSubmitCommandBuilder
is requested to buildSparkSubmitCommand and constructEnvVarArgs
SparkSubmitCommandBuilder Properties and SparkSubmitOptionParser Attributes¶
SparkSubmitCommandBuilder Property | SparkSubmitOptionParser Attribute |
---|---|
verbose | VERBOSE |
master | MASTER [master] |
deployMode | DEPLOY_MODE [deployMode] |
appName | NAME [appName] |
conf | CONF [key=value]* |
propertiesFile | PROPERTIES_FILE [propertiesFile] |
jars | JARS [comma-separated jars] |
files | FILES [comma-separated files] |
pyFiles | PY_FILES [comma-separated pyFiles] |
mainClass | CLASS [mainClass] |
sparkArgs | sparkArgs (passed straight through) |
appResource | appResource (passed straight through) |
appArgs | appArgs (passed straight through) |