Spark Connect¶
Spark Connect is a client-server interface for Apache Spark for remote connectivity to Spark clusters (using the DataFrame API and unresolved logical plans as the protocol based on gRPC Java).
Spark Connect is available since Apache Spark 3.4.
The Spark Connect server can be started using sbin/start-connect-server.sh
shell script.
$ ./sbin/start-connect-server.sh
starting org.apache.spark.sql.connect.service.SparkConnectServer, logging to...
$ tail -1 logs/spark-jacek-org.apache.spark.sql.connect.service.SparkConnectServer-1-Jaceks-Mac-mini.local.out
... Spark Connect server started.
Use Spark Connect for interactive analysis:
And you will notice that the PySpark shell welcome message tells you that you have connected to Spark using Spark Connect:
Check the Spark session type:
SparkSession available as 'spark'.
>>> type(spark)
<class 'pyspark.sql.connect.session.SparkSession'>
Now you can run PySpark code in the shell to see Spark Connect in action: