SparkSession — Spark Connect Scala Client¶
SparkSession is a Spark Connect Scala Client.
SparkSession is a api.SparkSession.
Spark SQL's SparkSession
There is the Spark SQL variant of api.SparkSession.
Creating Instance¶
SparkSession takes the following to be created:
- SparkConnectClient
- Plan ID Generator (
AtomicLong)
SparkSession is created when:
SparkSessionutility is used to create a SparkSession (for a Configuration)SparkSession.Builderis requested to tryCreateSessionFromClient
Create SparkSession¶
create creates a new SparkSession for a SparkConnectClient (based on the given Configuration) and this Plan ID Generator.
create is used when:
SparkSessionutility is used to load a SparkSession from a cacheSparkSession.Builderis requested to create a SparkSession
Execute Command¶
DeveloperApi
execute is a DeveloperApi.
execute executeInternal a new proto.Plan for the given proto.Command.
execute is used when:
Datasetis requested to checkpoint and createTempViewSparkSessionis requested to registerUdfDataFrameWriterImplis requested to executeWriteOperationDataFrameWriterV2Implis requested to executeWriteOperationMergeIntoWriterImplis requested to mergeSessionCleaneris requested to doCleanupCachedRemoteRelationDataStreamWriteris requested to startRemoteStreamingQueryis requested to executeQueryCmdStreamingQueryListenerBusis requested toremoveaStreamingQueryListenerStreamingQueryManageris requested toexecuteManagerCmd
executeInternal¶
executeInternal requests the SparkConnectClient to execute the given proto.Plan.
With an ExecutePlanResponse, executeInternal processRegisteredObservedMetrics.
Run Local Connect Server to Execute Code¶
withLocalConnectServer finds the Spark remote URL based on the following:
spark.remotein the sparkOptionsspark.remotein the system properties (-D)SPARK_REMOTEenvironment variable
withLocalConnectServer makes sure that the following are all met before starting up a new Spark Connect server:
- The server process has not been assigned yet
- The Spark remote URL starts with
local SPARK_HOMEenvironment variable is defined and is a directory withsbin/start-connect-server.shshell script
withLocalConnectServer starts a new Spark Connect server with the following command:
In the end, withLocalConnectServer executes the given f block.
withLocalConnectServer is used when:
SparkSession.Builderis requested to create a SparkSession and getOrCreate- ConnectRepl standalone application is launched