Skip to content

== [[spark-shell]] Spark Shell -- spark-shell shell script

Spark shell is an interactive environment where you can learn how to make the most out of Apache Spark quickly and conveniently.

TIP: Spark shell is particularly helpful for fast interactive prototyping.

Under the covers, Spark shell is a standalone Spark application written in Scala that offers environment with auto-completion (using TAB key) where you can run ad-hoc queries and get familiar with the features of Spark (that help you in developing your own standalone Spark applications). It is a very convenient tool to explore the many things available in Spark with immediate feedback. It is one of the many reasons why spark-overview.md#why-spark[Spark is so helpful for tasks to process datasets of any size].

There are variants of Spark shell for different languages: spark-shell for Scala, pyspark for Python and sparkR for R.

NOTE: This document (and the book in general) uses spark-shell for Scala only.

You can start Spark shell using <spark-shell script>>.

$ ./bin/spark-shell
scala>

spark-shell is an extension of Scala REPL with automatic instantiation of spark-sql-SparkSession.md[SparkSession] as spark (and ROOT:SparkContext.md[] as sc).

[source, scala]

scala> :type spark org.apache.spark.sql.SparkSession

// Learn the current version of Spark in use scala> spark.version res0: String = 2.1.0-SNAPSHOT


spark-shell also imports spark-sql-SparkSession.md#implicits[Scala SQL's implicits] and spark-sql-SparkSession.md#sql[sql method].

[source, scala]

scala> :imports 1) import spark.implicits._ (59 terms, 38 are implicit) 2) import spark.sql (1 terms)


[NOTE]

When you execute spark-shell you actually execute spark-submit.md[Spark submit] as follows:

[options="wrap"]

org.apache.spark.deploy.SparkSubmit --class org.apache.spark.repl.Main --name Spark shell spark-shell

Set SPARK_PRINT_LAUNCH_COMMAND to see the entire command to be executed. Refer to spark-tips-and-tricks.md#SPARK_PRINT_LAUNCH_COMMAND[Print Launch Command of Spark Scripts].

=== [[using-spark-shell]] Using Spark shell

You start Spark shell using spark-shell script (available in bin directory).

$ ./bin/spark-shell
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
Spark context Web UI available at http://10.47.71.138:4040
Spark context available as 'sc' (master = local[*], app id = local-1477858597347).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.1.0-SNAPSHOT
      /_/

Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_112)
Type in expressions to have them evaluated.
Type :help for more information.

scala>

Spark shell creates an instance of spark-sql-SparkSession.md[SparkSession] under the name spark for you (so you don't have to know the details how to do it yourself on day 1).

scala> :type spark
org.apache.spark.sql.SparkSession

Besides, there is also sc value created which is an instance of ROOT:SparkContext.md[].

scala> :type sc
org.apache.spark.SparkContext

To close Spark shell, you press Ctrl+D or type in :q (or any subset of :quit).

scala> :q

=== [[settings]] Settings

.Spark Properties [cols="1,1,2",options="header",width="100%"] |=== | Spark Property | Default Value | Description | [[spark_repl_class_uri]] spark.repl.class.uri | null | Used in spark-shell to create REPL ClassLoader to load new classes defined in the Scala REPL as a user types code.

Enable INFO logging level for executor:Executor.md[org.apache.spark.executor.Executor] logger to have the value printed out to the logs:

INFO Using REPL class URI: [classUri]

|===


Last update: 2020-10-06