Spark Tips and Tricks
= Spark Tips and Tricks
== [[SPARK_PRINT_LAUNCH_COMMAND]] Print Launch Command of Spark Scripts
SPARK_PRINT_LAUNCH_COMMAND environment variable controls whether the Spark launch command is printed out to the standard error output, i.e.
System.err, or not.
Spark Command: [here comes the command] ========================================
All the Spark shell scripts use
org.apache.spark.launcher.Main class internally that checks
SPARK_PRINT_LAUNCH_COMMAND and when set (to any value) will print out the entire command line to launch it.
$ SPARK_PRINT_LAUNCH_COMMAND=1 ./bin/spark-shell Spark Command: /Library/Java/JavaVirtualMachines/Current/Contents/Home/bin/java -cp /Users/jacek/dev/oss/spark/conf/:/Users/jacek/dev/oss/spark/assembly/target/scala-2.11/spark-assembly-1.6.0-SNAPSHOT-hadoop2.7.1.jar:/Users/jacek/dev/oss/spark/lib_managed/jars/datanucleus-api-jdo-3.2.6.jar:/Users/jacek/dev/oss/spark/lib_managed/jars/datanucleus-core-3.2.10.jar:/Users/jacek/dev/oss/spark/lib_managed/jars/datanucleus-rdbms-3.2.9.jar -Dscala.usejavacp=true -Xms1g -Xmx1g org.apache.spark.deploy.SparkSubmit --master spark://localhost:7077 --class org.apache.spark.repl.Main --name Spark shell spark-shell ========================================
== Show Spark version in Spark shell
In spark-shell, use
org.apache.spark.SPARK_VERSION to know the Spark version:
scala> sc.version res0: String = 1.6.0-SNAPSHOT scala> org.apache.spark.SPARK_VERSION res1: String = 1.6.0-SNAPSHOT
== Resolving local host name
When you face networking issues when Spark can't resolve your local hostname or IP address, use the preferred
SPARK_LOCAL_HOSTNAME environment variable as the custom host name or
SPARK_LOCAL_IP as the custom IP that is going to be later resolved to a hostname.
Spark checks them out before using http://docs.oracle.com/javase/8/docs/api/java/net/InetAddress.html#getLocalHost--[java.net.InetAddress.getLocalHost()] (consult https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L759[org.apache.spark.util.Utils.findLocalInetAddress()] method).
You may see the following WARN messages in the logs when Spark finished the resolving process:
WARN Your hostname, [hostname] resolves to a loopback address: [host-address]; using... WARN Set SPARK_LOCAL_IP if you need to bind to another address
== [[spark-standalone-windows]] Starting standalone Master and workers on Windows 7
Windows 7 users can use spark-class.md[spark-class] to start spark-standalone.md[Spark Standalone] as there are no launch scripts for the Windows platform.
$ ./bin/spark-class org.apache.spark.deploy.master.Master -h localhost
$ ./bin/spark-class org.apache.spark.deploy.worker.Worker spark://localhost:7077