Spark History Server

Spark History Server is the web UI for completed and running (aka incomplete) Spark applications. It is an extension of Spark’s web UI.

spark history server webui
Figure 1. History Server’s web UI
Enable collecting events in your Spark applications using spark.eventLog.enabled Spark property.

You can start History Server by executing start-history-server.sh shell script and stop it using stop-history-server.sh.

start-history-server.sh accepts --properties-file [propertiesFile] command-line option that specifies the properties file with the custom Spark properties.

$ ./sbin/start-history-server.sh --properties-file history.properties

If not specified explicitly, Spark History Server uses the default configuration file, i.e. spark-defaults.conf.

Enable INFO logging level for org.apache.spark.deploy.history logger to see what happens inside.

Add the following line to conf/log4j.properties:

log4j.logger.org.apache.spark.deploy.history=INFO

Refer to Logging.

Starting History Server — start-history-server.sh script

You can start a HistoryServer instance by executing $SPARK_HOME/sbin/start-history-server.sh script (where SPARK_HOME is the directory of your Spark installation).

$ ./sbin/start-history-server.sh
starting org.apache.spark.deploy.history.HistoryServer, logging to .../spark/logs/spark-jacek-org.apache.spark.deploy.history.HistoryServer-1-japila.out

Internally, start-history-server.sh script starts org.apache.spark.deploy.history.HistoryServer standalone application for execution (using spark-daemon.sh shell script).

$ ./bin/spark-class org.apache.spark.deploy.history.HistoryServer
Using the more explicit approach with spark-class to start Spark History Server could be easier to trace execution by seeing the logs printed out to the standard output and hence terminal directly.

When started, it prints out the following INFO message to the logs:

INFO HistoryServer: Started daemon with process name: [processName]

It registers signal handlers (using SignalUtils) for TERM, HUP, INT to log their execution:

ERROR HistoryServer: RECEIVED SIGNAL [signal]

It inits security if enabled (using spark.history.kerberos.enabled setting).

FIXME Describe initSecurity

It creates a SecurityManager.

It creates a HistoryServer and requests it to bind to spark.history.ui.port port.

The host’s IP can be specified using SPARK_LOCAL_IP environment variable (defaults to 0.0.0.0).

You should see the following INFO message in the logs:

INFO HistoryServer: Bound HistoryServer to [host], and started at [webUrl]

It registers a shutdown hook to call stop on the HistoryServer instance.

Use stop-history-server.sh shell script to to stop a running History Server.

Stopping History Server — stop-history-server.sh script

You can stop a running instance of HistoryServer using $SPARK_HOME/sbin/stop-history-server.sh shell script.

$ ./sbin/stop-history-server.sh
stopping org.apache.spark.deploy.history.HistoryServer

Settings

Table 1. Spark Properties
Setting Default Value Description

spark.history.ui.port

18080

The port of the History Server’s UI.

spark.history.fs.logDirectory

file:/tmp/spark-events

The directory with the event logs. The directory has to exist before starting History Server.

spark.history.retainedApplications

50

How many Spark applications to retain.

spark.history.ui.maxApplications

(unbounded)

how many Spark applications to show in the UI.

spark.history.kerberos.enabled

false

Enable security when working with HDFS with security enabled (Kerberos).

spark.history.kerberos.principal

(empty)

Kerberos principal. Required when spark.history.kerberos.enabled is enabled.

spark.history.kerberos.keytab

(empty)

Keytab to use for login to Kerberos. Required when spark.history.kerberos.enabled is enabled.

spark.history.provider

org.apache.spark.deploy.history.FsHistoryProvider

The fully-qualified class name for a ApplicationHistoryProvider.