SessionState — State Separation Layer Between SparkSessions¶

SessionState is a state separation layer between Spark SQL sessions, including SQL configuration, tables, functions, UDFs, SQL parser, and everything else that depends on a SQLConf.

Attributes¶

Adaptive Rules¶

adaptiveRulesHolder: AdaptiveRulesHolder

User-Defined Adaptive Query Rules

adaptiveRulesHolder is given when SessionState is created.

adaptiveRulesHolder is used when AdaptiveSparkPlanExec physical operator is requested for the following:

The AdaptiveRulesHolder is used when AdaptiveSparkPlanExec physical operator is requested for the following:

ColumnarRules¶

columnarRules: Seq[ColumnarRule]

ColumnarRules

ExecutionListenerManager¶

listenerManager: ExecutionListenerManager

ExecutionListenerManager

ExperimentalMethods¶

experimentalMethods: ExperimentalMethods

ExperimentalMethods

FunctionRegistry¶

functionRegistry: FunctionRegistry

FunctionRegistry

Logical Analyzer¶

analyzer: Analyzer

Analyzer

Initialized lazily (only when requested the first time) using the analyzerBuilder factory function.

Logical Optimizer¶

optimizer: Optimizer

Logical Optimizer that is created using the optimizerBuilder function (and cached for later usage)

Used when:

QueryExecution is requested to create an optimized logical plan
(Structured Streaming) IncrementalExecution is requested to create an optimized logical plan

ParserInterface¶

sqlParser: ParserInterface

ParserInterface

SessionCatalog¶

catalog: SessionCatalog

SessionCatalog that is created using the catalogBuilder function (and cached for later usage).

SessionResourceLoader¶

resourceLoader: SessionResourceLoader

Spark Query Planner¶

planner: SparkPlanner

SparkPlanner

SQLConf¶

conf: SQLConf

SQLConf

StreamingQueryManager¶

streamingQueryManager: StreamingQueryManager

span id="UDFRegistration"> UDFRegistration¶

udfRegistration: UDFRegistration

SessionState is given an UDFRegistration when created.

AQE QueryStage Physical Preparation Rules¶

queryStagePrepRules: Seq[Rule[SparkPlan]]

SessionState can be given a collection of physical optimizations (Rule[SparkPlan]s) when created.

queryStagePrepRules is given when BaseSessionStateBuilder is requested to build a SessionState based on queryStagePrepRules (from a SparkSessionExtensions).

queryStagePrepRules is used to extend the built-in QueryStage Physical Preparation Rules in Adaptive Query Execution.

Creating Instance¶

SessionState takes the following to be created:

SharedState
SQLConf
ExperimentalMethods
FunctionRegistry
UDFRegistration
Function to build a SessionCatalog (() => SessionCatalog)
ParserInterface
Function to build a Analyzer (() => Analyzer)
Function to build a Logical Optimizer (() => Optimizer)
SparkPlanner
Function to build a StreamingQueryManager (() => StreamingQueryManager)
ExecutionListenerManager
Function to build a SessionResourceLoader (() => SessionResourceLoader)
Function to build a QueryExecution (LogicalPlan => QueryExecution)
SessionState Clone Function ((SparkSession, SessionState) => SessionState)
ColumnarRules
AQE Rules
planNormalizationRules

SessionState is created when:

SparkSession is requested to instantiateSessionState (when requested for the SessionState per spark.sql.catalogImplementation configuration property)

Creating SessionState

When requested for the SessionState, SparkSession uses spark.sql.catalogImplementation configuration property to load and create a BaseSessionStateBuilder that is then requested to create a SessionState instance.

There are two BaseSessionStateBuilders available:

(default) SessionStateBuilder for in-memory catalog
HiveSessionStateBuilder for hive catalog

hive catalog is set when the SparkSession was created with the Hive support enabled (using Builder.enableHiveSupport).

Creating QueryExecution For LogicalPlan¶

executePlan(
  plan: LogicalPlan): QueryExecution

executePlan uses the createQueryExecution function to create a QueryExecution for the given LogicalPlan.

Creating New Hadoop Configuration¶

newHadoopConf(): Configuration

newHadoopConf returns a new Hadoop Configuration (with the SparkContext.hadoopConfiguration and all the configuration properties of the SQLConf).

Creating New Hadoop Configuration With Extra Options¶

newHadoopConfWithOptions(
  options: Map[String, String]): Configuration

newHadoopConfWithOptions creates a new Hadoop Configuration with the input options set (except path and paths options that are skipped).

newHadoopConfWithOptions is used when:

TextBasedFileFormat is requested to isSplitable
FileSourceScanExec physical operator is requested for the input RDD
InsertIntoHadoopFsRelationCommand logical command is executed
PartitioningAwareFileIndex is requested for the Hadoop Configuration

Accessing SessionState¶

SessionState is available using SparkSession.sessionState.

import org.apache.spark.sql.SparkSession
assert(spark.isInstanceOf[SparkSession])

// object SessionState in package org.apache.spark.sql.internal cannot be accessed directly
scala> :type spark.sessionState
org.apache.spark.sql.internal.SessionState