Skip to content

pyspark Shell Script

pyspark shell script runs spark-submit with pyspark-shell-main application resource as the first argument followed by --name "PySparkShell" option (with other command-line arguments, if specified).

pyspark/shell.py

pyspark/shell.py

Learn more about pyspark/shell.py in The Internals of PySpark.

pyspark/shell.py module is launched as a PYTHONSTARTUP script.

Environment Variables

pyspark script exports the following environment variables:

OLD_PYTHONSTARTUP

pyspark defines OLD_PYTHONSTARTUP environment variable to be the initial value of PYTHONSTARTUP (before it gets redefined).

The idea of OLD_PYTHONSTARTUP is to delay execution of the Python startup script until pyspark/shell.py finishes.

PYSPARK_PYTHON

PYSPARK_PYTHON environment variable can be used to specify a Python executable to run PySpark scripts.

The Internals of PySpark

Learn more about PySpark in The Internals of PySpark.

PYSPARK_PYTHON can be overriden by PYSPARK_DRIVER_PYTHON and configuration properties when SparkSubmitCommandBuilder is requested to buildPySparkShellCommand.

PYSPARK_PYTHON is overriden by spark.pyspark.python configuration property, if defined, when SparkSubmitCommandBuilder is requested to buildPySparkShellCommand.

PYTHONSTARTUP

From Python Documentation:

PYTHONSTARTUP

If this is the name of a readable file, the Python commands in that file are executed before the first prompt is displayed in interactive mode. The file is executed in the same namespace where interactive commands are executed so that objects defined or imported in it can be used without qualification in the interactive session. You can also change the prompts sys.ps1 and sys.ps2 and the hook sys.__interactivehook__ in this file.

pyspark (re)defines PYTHONSTARTUP environment variable to be pyspark/shell.py module:

${SPARK_HOME}/python/pyspark/shell.py

OLD_PYTHONSTARTUP

The initial value of PYTHONSTARTUP environment variable is available as OLD_PYTHONSTARTUP.