Spark Standalone - Using ZooKeeper for High-Availability of Master

== Spark Standalone - Using ZooKeeper for High-Availability of Master

TIP: Read ../spark-standalone-Master.md#recovery-mode[Recovery Mode] to know the theory.

You're going to start two standalone Masters.

You'll need 4 terminals (adjust addresses as needed):

Start ZooKeeper.

Create a configuration file ha.conf with the content as follows:

spark.deploy.recoveryMode=ZOOKEEPER
spark.deploy.zookeeper.url=<zookeeper_host>:2181
spark.deploy.zookeeper.dir=/spark

Start the first standalone Master.

./sbin/start-master.sh -h localhost -p 7077 --webui-port 8080 --properties-file ha.conf

Start the second standalone Master.

NOTE: It is not possible to start another instance of standalone Master on the same machine using ./sbin/start-master.sh. The reason is that the script assumes one instance per machine only. We're going to change the script to make it possible.

$ cp ./sbin/start-master{,-2}.sh

$ grep "CLASS 1" ./sbin/start-master-2.sh
"$\{SPARK_HOME}/sbin"/spark-daemon.sh start $CLASS 1 \

$ sed -i -e 's/CLASS 1/CLASS 2/' sbin/start-master-2.sh

$ grep "CLASS 1" ./sbin/start-master-2.sh

$ grep "CLASS 2" ./sbin/start-master-2.sh
"$\{SPARK_HOME}/sbin"/spark-daemon.sh start $CLASS 2 \

$ ./sbin/start-master-2.sh -h localhost -p 17077 --webui-port 18080 --properties-file ha.conf

You can check how many instances you're currently running using jps command as follows:

$ jps -lm
5024 sun.tools.jps.Jps -lm
4994 org.apache.spark.deploy.master.Master --ip japila.local --port 7077 --webui-port 8080 -h localhost -p 17077 --webui-port 18080 --properties-file ha.conf
4808 org.apache.spark.deploy.master.Master --ip japila.local --port 7077 --webui-port 8080 -h localhost -p 7077 --webui-port 8080 --properties-file ha.conf
4778 org.apache.zookeeper.server.quorum.QuorumPeerMain config/zookeeper.properties

Start a standalone Worker.

./sbin/start-slave.sh spark://localhost:7077,localhost:17077

Start Spark shell.

./bin/spark-shell --master spark://localhost:7077,localhost:17077

Wait till the Spark shell connects to an active standalone Master.

Find out which standalone Master is active (there can only be one). Kill it. Observe how the other standalone Master takes over and lets the Spark shell register with itself. Check out the master's UI.

Optionally, kill the worker, make sure it goes away instantly in the active master's logs.