SparkSession.Builder¶
SparkSession.Builder
is a builder interface to create SparkSessions.
Accessing Builder¶
Builder
is available using the SparkSession.builder factory method.
import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder
.appName("My Spark Application") // optional and will be autogenerated if not specified
.master("local[*]") // only for demo and testing purposes, use spark-submit instead
.enableHiveSupport() // self-explanatory, isn't it?
.config("spark.sql.warehouse.dir", "target/spark-warehouse")
.withExtensions { extensions =>
extensions.injectResolutionRule { session =>
...
}
extensions.injectOptimizerRule { session =>
...
}
}
.getOrCreate
Enabling Hive Support¶
enableHiveSupport(): Builder
enableHiveSupport
enables Hive support.
Note
You do not need any existing Hive installation to use Spark's Hive support. SparkSession
context will automatically create metastore_db
in the current directory of a Spark application and the directory configured by spark.sql.warehouse.dir configuration property.
Consult SharedState.
Internally, enableHiveSupport
checks whether Hive classes are available or not. If so, enableHiveSupport
sets spark.sql.catalogImplementation internal configuration property to hive
. Otherwise, enableHiveSupport
throws an IllegalArgumentException
:
Unable to instantiate SparkSession with Hive support because Hive classes are not found.
Getting Or Creating SparkSession Instance¶
getOrCreate(): SparkSession
getOrCreate
gives the active SparkSession or creates a new one.
While creating a new one, getOrCreate
finds the SparkSession extensions (based on spark.sql.extensions configuration property) and applies them to the SparkSessionExtensions.
SparkSessionExtensions¶
extensions: SparkSessionExtensions
Builder
creates a new SparkSessionExtensions when created.
The SparkSessionExtensions
is used to apply SparkSession extensions registered using spark.sql.extensions configuration property or Builder.withExtensions method.
In the end, Builder
uses the SparkSessionExtensions
to create a new SparkSession.
Registering SparkSessionExtensions¶
withExtensions(
f: SparkSessionExtensions => Unit): Builder
Allows registering SparkSession extensions using SparkSessionExtensions.
withExtensions
simply executes the input f
function with a SparkSessionExtensions
.
hiveClassesArePresent¶
hiveClassesArePresent: Boolean
hiveClassesArePresent
loads and initializes org.apache.spark.sql.hive.HiveSessionStateBuilder and org.apache.hadoop.hive.conf.HiveConf
classes from the current classloader.
hiveClassesArePresent
returns true
when the initialization succeeded, and false
otherwise (due to ClassNotFoundException
or NoClassDefFoundError
errors).
hiveClassesArePresent
is used when:
-
Builder
is requested to enableHiveSupport -
spark-shell
is executed