SparkSession — Spark Connect Scala Client¶
SparkSession
is a Spark Connect Scala Client.
SparkSession
is a api.SparkSession
.
Spark SQL's SparkSession
There is the Spark SQL variant of api.SparkSession
.
Creating Instance¶
SparkSession
takes the following to be created:
- SparkConnectClient
- Plan ID Generator (
AtomicLong
)
SparkSession
is created when:
SparkSession
utility is used to create a SparkSession (for a Configuration)SparkSession.Builder
is requested to tryCreateSessionFromClient
Create SparkSession¶
create
creates a new SparkSession for a SparkConnectClient (based on the given Configuration) and this Plan ID Generator.
create
is used when:
SparkSession
utility is used to load a SparkSession from a cacheSparkSession.Builder
is requested to create a SparkSession
Execute Command¶
DeveloperApi
execute
is a DeveloperApi
.
execute
executeInternal a new proto.Plan
for the given proto.Command
.
execute
is used when:
Dataset
is requested to checkpoint and createTempViewSparkSession
is requested to registerUdfDataFrameWriterImpl
is requested to executeWriteOperationDataFrameWriterV2Impl
is requested to executeWriteOperationMergeIntoWriterImpl
is requested to mergeSessionCleaner
is requested to doCleanupCachedRemoteRelationDataStreamWriter
is requested to startRemoteStreamingQuery
is requested to executeQueryCmdStreamingQueryListenerBus
is requested toremove
aStreamingQueryListener
StreamingQueryManager
is requested toexecuteManagerCmd
executeInternal¶
executeInternal
requests the SparkConnectClient to execute the given proto.Plan
.
With an ExecutePlanResponse
, executeInternal
processRegisteredObservedMetrics.
Run Local Connect Server to Execute Code¶
withLocalConnectServer
finds the Spark remote URL based on the following:
spark.remote
in the sparkOptionsspark.remote
in the system properties (-D
)SPARK_REMOTE
environment variable
withLocalConnectServer
makes sure that the following are all met before starting up a new Spark Connect server:
- The server process has not been assigned yet
- The Spark remote URL starts with
local
SPARK_HOME
environment variable is defined and is a directory withsbin/start-connect-server.sh
shell script
withLocalConnectServer
starts a new Spark Connect server with the following command:
In the end, withLocalConnectServer
executes the given f
block.
withLocalConnectServer
is used when:
SparkSession.Builder
is requested to create a SparkSession and getOrCreate- ConnectRepl standalone application is launched