pyspark Shell Script

pyspark shell script runs spark-submit with pyspark-shell-main application resource as the first argument followed by --name "PySparkShell" option (with other command-line arguments, if specified).



pyspark/ module is launched as a PYTHONSTARTUP script.

Environment Variables

pyspark script exports the following environment variables:


pyspark defines OLD_PYTHONSTARTUP environment variable to be the initial value of PYTHONSTARTUP (before it gets redefined).

The idea of OLD_PYTHONSTARTUP is to delay execution of the Python startup script until pyspark/ finishes.


PYSPARK_PYTHON environment variable can be used to specify a Python executable to run PySpark scripts.

PYSPARK_PYTHON can be overriden by PYSPARK_DRIVER_PYTHON and configuration properties when SparkSubmitCommandBuilder is requested to buildPySparkShellCommand.

PYSPARK_PYTHON is overriden by spark.pyspark.python configuration property, if defined, when SparkSubmitCommandBuilder is requested to buildPySparkShellCommand.


From Python Documentation:


If this is the name of a readable file, the Python commands in that file are executed before the first prompt is displayed in interactive mode. The file is executed in the same namespace where interactive commands are executed so that objects defined or imported in it can be used without qualification in the interactive session. You can also change the prompts sys.ps1 and sys.ps2 and the hook sys.__interactivehook__ in this file.

pyspark (re)defines PYTHONSTARTUP environment variable to be pyspark/ module:



The initial value of PYTHONSTARTUP environment variable is available as OLD_PYTHONSTARTUP.