Speculative Execution of Tasks¶
Speculative tasks (also speculatable tasks or task strugglers) are tasks that run slower than most (FIXME the setting) of the all tasks in a job.
Speculative execution of tasks is a health-check procedure that checks for tasks to be speculated, i.e. running slower in a stage than the median of all successfully completed tasks in a taskset (FIXME the setting). Such slow tasks will be re-submitted to another worker. It will not stop the slow tasks, but run a new copy in parallel.
The thread starts as
TaskSchedulerImpl starts in ROOT:spark-cluster.md[clustered deployment modes] with ROOT:configuration-properties.md#spark.speculation[spark.speculation] enabled. It executes periodically every ROOT:configuration-properties.md#spark.speculation.interval[spark.speculation.interval] after the initial
When enabled, you should see the following INFO message in the logs:
Starting speculative execution thread¶
It works as scheduler:TaskSchedulerImpl.md#task-scheduler-speculation[
task-scheduler-speculation daemon thread pool] (using
j.u.c.ScheduledThreadPoolExecutor with core pool size of 1).
The job with speculatable tasks should finish while speculative tasks are running, and it will leave these tasks running - no KILL command yet.
checkSpeculatableTasks method that asks
rootPool to check for speculatable tasks. If there are any,
SchedulerBackend is called for scheduler:SchedulerBackend.md#reviveOffers[reviveOffers].
CAUTION: FIXME How does Spark handle repeated results of speculative tasks since there are copies launched?