Skip to content

SparkSession.Builder

Creating Instance

SparkSession.Builder takes no arguments to be created.

SparkSession.Builder is created using SparkSession.builder utility.

Create SparkSession (with Local Spark Connect Server)

create(): SparkSession
Public API

create is part of the public API.

create runs a new Spark Connect Server.

create tries to create a brand new SparkSession from this SparkConnectClient or creates a brand new SparkSession from scratch (for the Configuration of this SparkConnectClient.Builder).

create setDefaultAndActiveSession and then applyOptions.

Try to Create New Session for SparkConnectClient

tryCreateSessionFromClient(): Option[SparkSession]

Note

tryCreateSessionFromClient always returns a brand new SparkSession.

tryCreateSessionFromClient creates a new SparkSession (with this SparkConnectClient and the planIdGenerator) when all the following is met:

Otherwise, tryCreateSessionFromClient returns no SparkSession.


tryCreateSessionFromClient is used when:

Reuse or Create SparkSession

SparkSessionBuilder
getOrCreate(): SparkSession

getOrCreate is part of the SparkSessionBuilder (Spark SQL) abstraction.

Public API

getOrCreate is part of the public API.

Generative AI

This description was co-authored using Generative AI tools, namely JetBrains AI Assistant (with the openai-gpt-4o model).

Prompt: "What does getOrCreate do? Please explain every line."

getOrCreate either returns an existing SparkSession (that matches the configuration provided) or creates a new SparkSession instance if no suitable session exists.

getOrCreate is used to ensure that a SparkSession is always available without having to worry whether a session already exists.

getOrCreate uses a caching mechanism alongside proper configuration and session updates.


getOrCreate starts a local Spark Connect server unless already started.

getOrCreate attempts to create a session by directly reusing the current client (if it exists and is valid). If a valid client exists, a new SparkSession is built and returned immediately.

Otherwise (if no valid client exists or the session is not reusable), getOrCreate finds the SparkSession (in the sessions by the Configuration) or creates a session.

getOrCreate updates the global default and/or active SparkSession in the application.

getOrCreate applies spark.sql-prefixed options and the builder's options to the new or existing session.