FetchFailedException¶

FetchFailedException exception may be thrown when a task runs (and ShuffleBlockFetcherIterator could not fetch shuffle blocks).

When FetchFailedException is reported, TaskRunner catches it and notifies the ExecutorBackend (with TaskState.FAILED task state).

Creating Instance¶

FetchFailedException takes the following to be created:

BlockManagerId
Shuffle ID
Map ID
Map Index
Reduce ID
Error Message
Error Cause

While being created, FetchFailedException requests the current TaskContext to setFetchFailed.

FetchFailedException is created when:

ShuffleBlockFetcherIterator is requested to throw a FetchFailedException (for a ShuffleBlockId or a ShuffleBlockBatchId)

Error Cause¶

FetchFailedException can be given an error cause when created.

The root cause of the FetchFailedException is usually because the Executor (with the BlockManager for requested shuffle blocks) is lost and no longer available due to the following:

OutOfMemoryError could be thrown (aka OOMed) or some other unhandled exception
The cluster manager that manages the workers with the executors of your Spark application (e.g. Kubernetes, Hadoop YARN) enforces the container memory limits and eventually decides to kill the executor due to excessive memory usage

A solution is usually to tune the memory of your Spark application.

TaskContext¶

TaskContext comes with setFetchFailed and fetchFailed to hold a FetchFailedException unmodified (regardless of what happens in a user code).