Skip to content

Spark's Tips and Tricks

Print Launch Command of Spark Scripts

SPARK_PRINT_LAUNCH_COMMAND environment variable controls whether or not the Spark launch command is printed out to the standard error output.

All the Spark shell scripts use org.apache.spark.launcher.Main class internally that checks SPARK_PRINT_LAUNCH_COMMAND and when set (to any value) will print out the entire command line to launch it.

$ SPARK_PRINT_LAUNCH_COMMAND=1 ./bin/spark-shell
Spark Command: /Library/Java/JavaVirtualMachines/Current/Contents/Home/bin/java -cp /Users/jacek/dev/oss/spark/conf/:/Users/jacek/dev/oss/spark/assembly/target/scala-2.11/spark-assembly-1.6.0-SNAPSHOT-hadoop2.7.1.jar:/Users/jacek/dev/oss/spark/lib_managed/jars/datanucleus-api-jdo-3.2.6.jar:/Users/jacek/dev/oss/spark/lib_managed/jars/datanucleus-core-3.2.10.jar:/Users/jacek/dev/oss/spark/lib_managed/jars/datanucleus-rdbms-3.2.9.jar -Dscala.usejavacp=true -Xms1g -Xmx1g org.apache.spark.deploy.SparkSubmit --master spark://localhost:7077 --class org.apache.spark.repl.Main --name Spark shell spark-shell
========================================

Show Spark version in Spark shell

In spark-shell, use sc.version or org.apache.spark.SPARK_VERSION to know the Spark version:

scala> sc.version
res0: String = 1.6.0-SNAPSHOT

scala> org.apache.spark.SPARK_VERSION
res1: String = 1.6.0-SNAPSHOT

Resolving local host name

When you face networking issues when Spark can't resolve your local hostname or IP address, use the preferred SPARK_LOCAL_HOSTNAME environment variable as the custom host name or SPARK_LOCAL_IP as the custom IP that is going to be later resolved to a hostname.

Spark checks them out before using java.net.InetAddress.getLocalHost() (consult org.apache.spark.util.Utils.findLocalInetAddress() method).

You may see the following WARN messages in the logs when Spark finished the resolving process:

Your hostname, [hostname] resolves to a loopback address: [host-address]; using...
Set SPARK_LOCAL_IP if you need to bind to another address

Starting standalone Master and workers on Windows 7

Windows 7 users can use spark-class to start Spark Standalone as there are no launch scripts for the Windows platform.

./bin/spark-class org.apache.spark.deploy.master.Master -h localhost
./bin/spark-class org.apache.spark.deploy.worker.Worker spark://localhost:7077