Demo: Executing PySpark Applications Using spark-submit¶
PySpark applications are executed using spark-submit
(Spark Core) command-line application.
spark-submit 1.py extra args
For a PySpark application, spark-submit
uses PythonRunner and launches an extra python process:
ps -o pid,ppid,command | grep python | grep -v grep
org.apache.spark.deploy.SparkSubmit 1.py extra args
Python /usr/local/bin/ipython 1.py extra args
SPARK_PRINT_LAUNCH_COMMAND Environment Variable¶
Use SPARK_PRINT_LAUNCH_COMMAND
environment variable to have the complete Spark command printed out to the standard output (cf. spark-submit shell script).
SPARK_PRINT_LAUNCH_COMMAND=1 spark-submit 1.py extra args
verbose Option¶
Use --verbose
option for verbose debugging output.
Parsed arguments:
...
pyFiles null
...
primaryResource file:/Users/jacek/dev/sandbox/python-sandbox/1.py
name 1.py
childArgs [extra args]
...
Main class:
org.apache.spark.deploy.PythonRunner
Arguments:
file:/Users/jacek/dev/sandbox/python-sandbox/1.py
null
extra
args
Spark config:
(spark.app.name,1.py)
(spark.master,local[*])
(spark.submit.pyFiles,)
(spark.submit.deployMode,client)