Causing Stage to Fail

== Exercise: Causing Stage to Fail

The example shows how Spark re-executes a stage in case of stage failure.

=== Recipe

Start a Spark cluster, e.g. 1-node Hadoop YARN.

start-yarn.sh
// 2-stage job -- it _appears_ that a stage can be failed only when there is a shuffle
sc.parallelize(0 to 3e3.toInt, 2).map(n => (n % 2, n)).groupByKey.count

Use 2 executors at least so you can kill one and keep the application up and running (on one executor).

YARN_CONF_DIR=hadoop-conf ./bin/spark-shell --master yarn \
  -c spark.shuffle.service.enabled=true \
  --num-executors 2

Last update: 2020-10-06