Skip to content

SparkSession.Builder

SparkSession.Builder is a builder interface to create SparkSessions.

Accessing Builder

Builder is available using the SparkSession.builder factory method.

import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder
  .appName("My Spark Application")  // optional and will be autogenerated if not specified
  .master("local[*]")               // only for demo and testing purposes, use spark-submit instead
  .enableHiveSupport()              // self-explanatory, isn't it?
  .config("spark.sql.warehouse.dir", "target/spark-warehouse")
  .withExtensions { extensions =>
    extensions.injectResolutionRule { session =>
      ...
    }
    extensions.injectOptimizerRule { session =>
      ...
    }
  }
  .getOrCreate

Enabling Hive Support

enableHiveSupport(): Builder

enableHiveSupport enables Hive support.

Note

You do not need any existing Hive installation to use Spark's Hive support. SparkSession context will automatically create metastore_db in the current directory of a Spark application and the directory configured by spark.sql.warehouse.dir configuration property.

Consult SharedState.

Internally, enableHiveSupport checks whether Hive classes are available or not. If so, enableHiveSupport sets spark.sql.catalogImplementation internal configuration property to hive. Otherwise, enableHiveSupport throws an IllegalArgumentException:

Unable to instantiate SparkSession with Hive support because Hive classes are not found.

Getting Or Creating SparkSession Instance

getOrCreate(): SparkSession

getOrCreate gives the active SparkSession or creates a new one.

While creating a new one, getOrCreate finds the SparkSession extensions (based on spark.sql.extensions configuration property) and applies them to the SparkSessionExtensions.

SparkSessionExtensions

extensions: SparkSessionExtensions

Builder creates a new SparkSessionExtensions when created.

The SparkSessionExtensions is used to apply SparkSession extensions registered using spark.sql.extensions configuration property or Builder.withExtensions method.

In the end, Builder uses the SparkSessionExtensions to create a new SparkSession.

Registering SparkSessionExtensions

withExtensions(
  f: SparkSessionExtensions => Unit): Builder

Allows registering SparkSession extensions using SparkSessionExtensions.

withExtensions simply executes the input f function with a SparkSessionExtensions.

hiveClassesArePresent

hiveClassesArePresent: Boolean

hiveClassesArePresent loads and initializes org.apache.spark.sql.hive.HiveSessionStateBuilder and org.apache.hadoop.hive.conf.HiveConf classes from the current classloader.

hiveClassesArePresent returns true when the initialization succeeded, and false otherwise (due to ClassNotFoundException or NoClassDefFoundError errors).

hiveClassesArePresent is used when: