Skip to content

Demo: Executing PySpark Applications Using spark-submit

PySpark applications are executed using spark-submit (Spark Core) command-line application.

spark-submit 1.py extra args

For a PySpark application, spark-submit uses PythonRunner and launches an extra python process:

ps -o pid,ppid,command | grep python | grep -v grep
org.apache.spark.deploy.SparkSubmit 1.py extra args
Python /usr/local/bin/ipython 1.py extra args

SPARK_PRINT_LAUNCH_COMMAND Environment Variable

Use SPARK_PRINT_LAUNCH_COMMAND environment variable to have the complete Spark command printed out to the standard output (cf. spark-submit shell script).

SPARK_PRINT_LAUNCH_COMMAND=1 spark-submit 1.py extra args

verbose Option

Use --verbose option for verbose debugging output.

Parsed arguments:
  ...
  pyFiles                 null
  ...
  primaryResource         file:/Users/jacek/dev/sandbox/python-sandbox/1.py
  name                    1.py
  childArgs               [extra args]
...
Main class:
org.apache.spark.deploy.PythonRunner
Arguments:
file:/Users/jacek/dev/sandbox/python-sandbox/1.py
null
extra
args
Spark config:
(spark.app.name,1.py)
(spark.master,local[*])
(spark.submit.pyFiles,)
(spark.submit.deployMode,client)