SparkSession.Builder¶
SparkSession.Builder is a builder interface to create SparkSessions.
Accessing Builder¶
Builder is available using the SparkSession.builder factory method.
import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder
.appName("My Spark Application") // optional and will be autogenerated if not specified
.master("local[*]") // only for demo and testing purposes, use spark-submit instead
.enableHiveSupport() // self-explanatory, isn't it?
.config("spark.sql.warehouse.dir", "target/spark-warehouse")
.withExtensions { extensions =>
extensions.injectResolutionRule { session =>
...
}
extensions.injectOptimizerRule { session =>
...
}
}
.getOrCreate
Enabling Hive Support¶
enableHiveSupport(): Builder
enableHiveSupport enables Hive support.
Note
You do not need any existing Hive installation to use Spark's Hive support. SparkSession context will automatically create metastore_db in the current directory of a Spark application and the directory configured by spark.sql.warehouse.dir configuration property.
Consult SharedState.
Internally, enableHiveSupport checks whether Hive classes are available or not. If so, enableHiveSupport sets spark.sql.catalogImplementation internal configuration property to hive. Otherwise, enableHiveSupport throws an IllegalArgumentException:
Unable to instantiate SparkSession with Hive support because Hive classes are not found.
Getting Or Creating SparkSession Instance¶
getOrCreate(): SparkSession
getOrCreate gives the active SparkSession or creates a new one.
While creating a new one, getOrCreate finds the SparkSession extensions (based on spark.sql.extensions configuration property) and applies them to the SparkSessionExtensions.
SparkSessionExtensions¶
extensions: SparkSessionExtensions
Builder creates a new SparkSessionExtensions when created.
The SparkSessionExtensions is used to apply SparkSession extensions registered using spark.sql.extensions configuration property or Builder.withExtensions method.
In the end, Builder uses the SparkSessionExtensions to create a new SparkSession.
Registering SparkSessionExtensions¶
withExtensions(
f: SparkSessionExtensions => Unit): Builder
Allows registering SparkSession extensions using SparkSessionExtensions.
withExtensions simply executes the input f function with a SparkSessionExtensions.
hiveClassesArePresent¶
hiveClassesArePresent: Boolean
hiveClassesArePresent loads and initializes org.apache.spark.sql.hive.HiveSessionStateBuilder and org.apache.hadoop.hive.conf.HiveConf classes from the current classloader.
hiveClassesArePresent returns true when the initialization succeeded, and false otherwise (due to ClassNotFoundException or NoClassDefFoundError errors).
hiveClassesArePresent is used when:
-
Builderis requested to enableHiveSupport -
spark-shellis executed