Skip to content

FetchFailedException

FetchFailedException exception may be thrown when a task runs (and ShuffleBlockFetcherIterator could not fetch shuffle blocks).

When FetchFailedException is reported, TaskRunner catches it and notifies the ExecutorBackend (with TaskState.FAILED task state).

Creating Instance

FetchFailedException takes the following to be created:

While being created, FetchFailedException requests the current TaskContext to setFetchFailed.

FetchFailedException is created when:

Error Cause

FetchFailedException can be given an error cause when created.

The root cause of the FetchFailedException is usually because the Executor (with the BlockManager for requested shuffle blocks) is lost and no longer available due to the following:

  1. OutOfMemoryError could be thrown (aka OOMed) or some other unhandled exception
  2. The cluster manager that manages the workers with the executors of your Spark application (e.g. Kubernetes, Hadoop YARN) enforces the container memory limits and eventually decides to kill the executor due to excessive memory usage

A solution is usually to tune the memory of your Spark application.

TaskContext

TaskContext comes with setFetchFailed and fetchFailed to hold a FetchFailedException unmodified (regardless of what happens in a user code).