Spark Tips and Tricks
SPARK_PRINT_LAUNCH_COMMAND environment variable controls whether the Spark launch command is printed out to the standard error output, i.e.
System.err, or not.
Spark Command: [here comes the command] ========================================
All the Spark shell scripts use
org.apache.spark.launcher.Main class internally that checks
SPARK_PRINT_LAUNCH_COMMAND and when set (to any value) will print out the entire command line to launch it.
$ SPARK_PRINT_LAUNCH_COMMAND=1 ./bin/spark-shell Spark Command: /Library/Java/JavaVirtualMachines/Current/Contents/Home/bin/java -cp /Users/jacek/dev/oss/spark/conf/:/Users/jacek/dev/oss/spark/assembly/target/scala-2.11/spark-assembly-1.6.0-SNAPSHOT-hadoop2.7.1.jar:/Users/jacek/dev/oss/spark/lib_managed/jars/datanucleus-api-jdo-3.2.6.jar:/Users/jacek/dev/oss/spark/lib_managed/jars/datanucleus-core-3.2.10.jar:/Users/jacek/dev/oss/spark/lib_managed/jars/datanucleus-rdbms-3.2.9.jar -Dscala.usejavacp=true -Xms1g -Xmx1g org.apache.spark.deploy.SparkSubmit --master spark://localhost:7077 --class org.apache.spark.repl.Main --name Spark shell spark-shell ========================================
In spark-shell, use
org.apache.spark.SPARK_VERSION to know the Spark version:
scala> sc.version res0: String = 1.6.0-SNAPSHOT scala> org.apache.spark.SPARK_VERSION res1: String = 1.6.0-SNAPSHOT
When you face networking issues when Spark can’t resolve your local hostname or IP address, use the preferred
SPARK_LOCAL_HOSTNAME environment variable as the custom host name or
SPARK_LOCAL_IP as the custom IP that is going to be later resolved to a hostname.
Spark checks them out before using java.net.InetAddress.getLocalHost() (consult org.apache.spark.util.Utils.findLocalInetAddress() method).
You may see the following WARN messages in the logs when Spark finished the resolving process:
WARN Your hostname, [hostname] resolves to a loopback address: [host-address]; using... WARN Set SPARK_LOCAL_IP if you need to bind to another address