Skip to content

SparkSession — Spark Connect Scala Client

SparkSession is a Spark Connect Scala Client.

SparkSession is a api.SparkSession.

Spark SQL's SparkSession

There is the Spark SQL variant of api.SparkSession.

Creating Instance

SparkSession takes the following to be created:

SparkSession is created when:

Create SparkSession

create(
  configuration: Configuration): SparkSession

create creates a new SparkSession for a SparkConnectClient (based on the given Configuration) and this Plan ID Generator.


create is used when:

Execute Command

execute(
  command: proto.Command): Seq[ExecutePlanResponse]

DeveloperApi

execute is a DeveloperApi.

execute executeInternal a new proto.Plan for the given proto.Command.


execute is used when:

executeInternal

executeInternal(
  plan: proto.Plan): CloseableIterator[ExecutePlanResponse]

executeInternal requests the SparkConnectClient to execute the given proto.Plan.

With an ExecutePlanResponse, executeInternal processRegisteredObservedMetrics.

Run Local Connect Server to Execute Code

withLocalConnectServer[T](
  f: => T): T

withLocalConnectServer finds the Spark remote URL based on the following:

  1. spark.remote in the sparkOptions
  2. spark.remote in the system properties (-D)
  3. SPARK_REMOTE environment variable

withLocalConnectServer makes sure that the following are all met before starting up a new Spark Connect server:

  • The server process has not been assigned yet
  • The Spark remote URL starts with local
  • SPARK_HOME environment variable is defined and is a directory with sbin/start-connect-server.sh shell script

withLocalConnectServer starts a new Spark Connect server with the following command:

$SPARK_HOME/sbin/start-connect-server.sh --master [localURL] [sparkOptions]

In the end, withLocalConnectServer executes the given f block.


withLocalConnectServer is used when: