Spark Connect¶
PySpark supports remote connection to Spark clusters using Spark Connect (Spark SQL).
$ ./bin/pyspark --help
Usage: ./bin/pyspark [options]
Options:
Spark Connect only:
--remote CONNECT_URL URL to connect to the server for Spark Connect, e.g.,
sc://host:port. --master and --deploy-mode cannot be set
together with this option. This option is experimental, and
might change between minor releases.
...
Spark Connect for Python requires the following Python libraries:
Module | Version |
---|---|
pandas | 1.0.5 |
pyarrow | 1.0.0 |
grpc | 1.48.1 |
// switching to an conda environment with the libraries
$ conda activate pyspark
$ ./bin/pyspark --remote sc://localhost
Python 3.10.10 (main, Mar 21 2023, 13:41:39) [Clang 14.0.6 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 3.4.0
/_/
Using Python version 3.10.10 (main, Mar 21 2023 13:41:39)
Client connected to the Spark Connect server at localhost
SparkSession available as 'spark'.
>>> spark.client
<pyspark.sql.connect.client.SparkConnectClient object at 0x7fed8867ab90>
is_remote¶
# from pyspark.sql.utils import is_remote
is_remote() -> bool
is_remote
is True
when SPARK_REMOTE
environment variable is defined (in os.environ
).