SessionState — State Separation Layer Between SparkSessions¶
SessionState
is a state separation layer between Spark SQL sessions, including SQL configuration, tables, functions, UDFs, SQL parser, and everything else that depends on a SQLConf.
Attributes¶
Adaptive Rules¶
adaptiveRulesHolder: AdaptiveRulesHolder
User-Defined Adaptive Query Rules
adaptiveRulesHolder
is given when SessionState
is created.
adaptiveRulesHolder
is used when AdaptiveSparkPlanExec physical operator is requested for the following:
The AdaptiveRulesHolder
is used when AdaptiveSparkPlanExec physical operator is requested for the following:
- Executing AQE Query Post Planner Strategy Rules
- Adaptive Logical Optimizer
- Adaptive Query Stage Physical Optimizations
- Adaptive Query Stage Physical Preparation Rules
ColumnarRules¶
columnarRules: Seq[ColumnarRule]
ExecutionListenerManager¶
listenerManager: ExecutionListenerManager
ExperimentalMethods¶
experimentalMethods: ExperimentalMethods
FunctionRegistry¶
functionRegistry: FunctionRegistry
Logical Analyzer¶
analyzer: Analyzer
Initialized lazily (only when requested the first time) using the analyzerBuilder factory function.
Logical Optimizer¶
optimizer: Optimizer
Logical Optimizer that is created using the optimizerBuilder function (and cached for later usage)
Used when:
QueryExecution
is requested to create an optimized logical plan- (Structured Streaming)
IncrementalExecution
is requested to create an optimized logical plan
ParserInterface¶
sqlParser: ParserInterface
SessionCatalog¶
catalog: SessionCatalog
SessionCatalog that is created using the catalogBuilder function (and cached for later usage).
SessionResourceLoader¶
resourceLoader: SessionResourceLoader
Spark Query Planner¶
planner: SparkPlanner
SQLConf¶
conf: SQLConf
StreamingQueryManager¶
streamingQueryManager: StreamingQueryManager
span id="UDFRegistration"> UDFRegistration¶
udfRegistration: UDFRegistration
SessionState
is given an UDFRegistration when created.
AQE QueryStage Physical Preparation Rules¶
queryStagePrepRules: Seq[Rule[SparkPlan]]
SessionState
can be given a collection of physical optimizations (Rule[SparkPlan]
s) when created.
queryStagePrepRules
is given when BaseSessionStateBuilder
is requested to build a SessionState based on queryStagePrepRules (from a SparkSessionExtensions).
queryStagePrepRules
is used to extend the built-in QueryStage Physical Preparation Rules in Adaptive Query Execution.
Creating Instance¶
SessionState
takes the following to be created:
- SQLConf
- ExperimentalMethods
- FunctionRegistry
- UDFRegistration
- Function to build a SessionCatalog (
() => SessionCatalog
) - ParserInterface
- Function to build a Analyzer (
() => Analyzer
) - Function to build a Logical Optimizer (
() => Optimizer
) - SparkPlanner
- Function to build a
StreamingQueryManager
(() => StreamingQueryManager
) - ExecutionListenerManager
- Function to build a
SessionResourceLoader
(() => SessionResourceLoader
) - Function to build a QueryExecution (
LogicalPlan => QueryExecution
) -
SessionState
Clone Function ((SparkSession, SessionState) => SessionState
) - ColumnarRules
- AQE Rules
- planNormalizationRules
SessionState
is created when:
SparkSession
is requested to instantiateSessionState (when requested for the SessionState per spark.sql.catalogImplementation configuration property)
When requested for the SessionState, SparkSession
uses spark.sql.catalogImplementation configuration property to load and create a BaseSessionStateBuilder that is then requested to create a SessionState instance.
There are two BaseSessionStateBuilders
available:
- (default) SessionStateBuilder for
in-memory
catalog - HiveSessionStateBuilder for
hive
catalog
hive
catalog is set when the SparkSession
was created with the Hive support enabled (using Builder.enableHiveSupport).
Creating QueryExecution For LogicalPlan¶
executePlan(
plan: LogicalPlan): QueryExecution
executePlan
uses the createQueryExecution function to create a QueryExecution for the given LogicalPlan.
Creating New Hadoop Configuration¶
newHadoopConf(): Configuration
newHadoopConf
returns a new Hadoop Configuration (with the SparkContext.hadoopConfiguration
and all the configuration properties of the SQLConf).
Creating New Hadoop Configuration With Extra Options¶
newHadoopConfWithOptions(
options: Map[String, String]): Configuration
newHadoopConfWithOptions
creates a new Hadoop Configuration with the input options
set (except path
and paths
options that are skipped).
newHadoopConfWithOptions
is used when:
TextBasedFileFormat
is requested toisSplitable
FileSourceScanExec
physical operator is requested for the input RDD- InsertIntoHadoopFsRelationCommand logical command is executed
PartitioningAwareFileIndex
is requested for the Hadoop Configuration
Accessing SessionState¶
SessionState
is available using SparkSession.sessionState.
import org.apache.spark.sql.SparkSession
assert(spark.isInstanceOf[SparkSession])
// object SessionState in package org.apache.spark.sql.internal cannot be accessed directly
scala> :type spark.sessionState
org.apache.spark.sql.internal.SessionState