FetchFailedException¶
FetchFailedException
exception may be thrown when a task runs (and ShuffleBlockFetcherIterator
could not fetch shuffle blocks).
When FetchFailedException
is reported, TaskRunner
catches it and notifies the ExecutorBackend (with TaskState.FAILED
task state).
Creating Instance¶
FetchFailedException
takes the following to be created:
- BlockManagerId
- Shuffle ID
- Map ID
- Map Index
- Reduce ID
- Error Message
- Error Cause
While being created, FetchFailedException
requests the current TaskContext to setFetchFailed.
FetchFailedException
is created when:
ShuffleBlockFetcherIterator
is requested to throw a FetchFailedException (for aShuffleBlockId
or aShuffleBlockBatchId
)
Error Cause¶
FetchFailedException
can be given an error cause when created.
The root cause of the FetchFailedException
is usually because the Executor (with the BlockManager for requested shuffle blocks) is lost and no longer available due to the following:
OutOfMemoryError
could be thrown (aka OOMed) or some other unhandled exception- The cluster manager that manages the workers with the executors of your Spark application (e.g. Kubernetes, Hadoop YARN) enforces the container memory limits and eventually decides to kill the executor due to excessive memory usage
A solution is usually to tune the memory of your Spark application.
TaskContext¶
TaskContext comes with setFetchFailed and fetchFailed to hold a FetchFailedException
unmodified (regardless of what happens in a user code).