Skip to content

barrier-execution-mode

= Barrier Execution Mode

Barrier Execution Mode is...FIXME

See https://jira.apache.org/jira/browse/SPARK-24374[SPIP: Barrier Execution Mode] and https://jira.apache.org/jira/browse/SPARK-24582[Design Doc].

NOTE: The barrier execution mode is experimental and it only handles limited scenarios.

In case of a task failure, instead of only restarting the failed task, Spark will abort the entire stage and re-launch all tasks for this stage.

Use <> transformation to mark the current stage as a <>.

[[barrier]] [source, scala]


barrier(): RDDBarrier[T]

barrier simply creates a <> that comes with the barrier-aware <> transformation.

[[mapPartitions]] [source, scala]


mapPartitionsS: ClassTag: RDD[S]


mapPartitions is simply changes the regular <> transformation to create a rdd:spark-rdd-MapPartitionsRDD.md[MapPartitionsRDD] with the rdd:spark-rdd-MapPartitionsRDD.md#isFromBarrier[isFromBarrier] flag enabled.

  • Task has a scheduler:Task.md#isBarrier[isBarrier] flag that says whether this task belongs to a barrier stage (default: false).

Spark must launch all the tasks at the same time for a <>.

An RDD is in a <>, if at least one of its parent RDD(s), or itself, are mapped from an RDDBarrier.

rdd:ShuffledRDD.md[ShuffledRDD] has the rdd:RDD.md#isBarrier[isBarrier] flag always disabled (false).

rdd:spark-rdd-MapPartitionsRDD.md[MapPartitionsRDD] is the only one RDD that can have the rdd:RDD.md#isBarrier_[isBarrier] flag enabled.

rdd:spark-RDDBarrier.md#mapPartitions[RDDBarrier.mapPartitions] is the only transformation that creates a rdd:spark-rdd-MapPartitionsRDD.md[MapPartitionsRDD] with the rdd:spark-rdd-MapPartitionsRDD.md#isFromBarrier[isFromBarrier] flag enabled.

== [[barrier-stage]] Barrier Stage

Barrier Stage is a scheduler:Stage.md[stage] that...FIXME


Last update: 2020-10-06