SparkSession.Builder¶
Creating Instance¶
SparkSession.Builder
takes no arguments to be created.
SparkSession.Builder
is created using SparkSession.builder utility.
Create SparkSession (with Local Spark Connect Server)¶
Public API
create
is part of the public API.
create
runs a new Spark Connect Server.
create
tries to create a brand new SparkSession from this SparkConnectClient or creates a brand new SparkSession from scratch (for the Configuration of this SparkConnectClient.Builder).
create
setDefaultAndActiveSession and then applyOptions.
Try to Create New Session for SparkConnectClient¶
Note
tryCreateSessionFromClient
always returns a brand new SparkSession.
tryCreateSessionFromClient
creates a new SparkSession (with this SparkConnectClient and the planIdGenerator) when all the following is met:
- This SparkConnectClient is available
- The underlying session is valid
Otherwise, tryCreateSessionFromClient
returns no SparkSession
.
tryCreateSessionFromClient
is used when:
SparkSession.Builder
is requested to create a new SparkSession and getOrCreate a SparkSession
Reuse or Create SparkSession¶
SparkSessionBuilder
getOrCreate
is part of the SparkSessionBuilder
(Spark SQL) abstraction.
Public API
getOrCreate
is part of the public API.
Generative AI
This description was co-authored using Generative AI tools, namely JetBrains AI Assistant (with the openai-gpt-4o model).
Prompt: "What does getOrCreate do? Please explain every line."
getOrCreate
either returns an existing SparkSession
(that matches the configuration provided) or creates a new SparkSession
instance if no suitable session exists.
getOrCreate
is used to ensure that a SparkSession
is always available without having to worry whether a session already exists.
getOrCreate
uses a caching mechanism alongside proper configuration and session updates.
getOrCreate
starts a local Spark Connect server unless already started.
getOrCreate
attempts to create a session by directly reusing the current client (if it exists and is valid). If a valid client exists, a new SparkSession
is built and returned immediately.
Otherwise (if no valid client exists or the session is not reusable), getOrCreate
finds the SparkSession
(in the sessions by the Configuration) or creates a session.
getOrCreate
updates the global default and/or active SparkSession
in the application.
getOrCreate
applies spark.sql
-prefixed options and the builder's options to the new or existing session.