Skip to content

PySpark Setup

Install IPython

Follow the steps as described in the official documentation of IPython.

pip install ipython

Start PySpark

export PYSPARK_DRIVER_PYTHON=ipython

For Java 11, use -Dio.netty.tryReflectionSetAccessible=true (see Downloading in the official documentation of Apache Spark).

./bin/pyspark --driver-java-options=-Dio.netty.tryReflectionSetAccessible=true
Python 3.9.1 (default, Feb  3 2021, 07:38:02)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.20.0 -- An enhanced Interactive Python. Type '?' for help.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 3.1.1
      /_/

Using Python version 3.9.1 (default, Feb  3 2021 07:38:02)
Spark context Web UI available at http://192.168.68.101:4040
Spark context available as 'sc' (master = local[*], app id = local-1613571272142).
SparkSession available as 'spark'.

In [1]:
In [1]: spark.version
Out[1]: '3.1.1'