Causing Stage to Fail
== Exercise: Causing Stage to Fail
The example shows how Spark re-executes a stage in case of stage failure.
=== Recipe
Start a Spark cluster, e.g. 1-node Hadoop YARN.
// 2-stage job -- it _appears_ that a stage can be failed only when there is a shuffle
sc.parallelize(0 to 3e3.toInt, 2).map(n => (n % 2, n)).groupByKey.count
Use 2 executors at least so you can kill one and keep the application up and running (on one executor).