ShuffleMapTask is one of the two types of Tasks. When executed,
ShuffleMapTask writes the result of executing a serialized task code over the records (of a RDD partition) to the shuffle system and returns a MapStatus (with the BlockManager and estimated size of the result shuffle blocks).
ShuffleMapTask takes the following to be created:
- Stage ID
- Stage Attempt ID
- Broadcast variable with a serialized task binary
- Local Properties
- Serialized task metrics
- Job ID (default:
- Application ID (default:
- Application Attempt ID (default:
- isBarrier flag
ShuffleMapTask is created when
DAGScheduler is requested to submit tasks for all missing partitions of a ShuffleMapStage.
ShuffleMapTask can be given
isBarrier flag when created. Unless given,
isBarrier is assumed disabled (
isBarrier flag is passed to the parent Task.
Serialized Task Binary¶
ShuffleMapTask is given a broadcast variable with a reference to a serialized task binary.
preferredLocations is part of the Task abstraction.
preferredLocs internal property.
ShuffleMapTask initializes the
preferredLocs internal property when created
context: TaskContext): MapStatus
runTask is part of the Task abstraction.
runTask writes the result (records) of executing the serialized task code over the records (in the RDD partition) to the shuffle system and returns a MapStatus (with the BlockManager and an estimated size of the result shuffle blocks).
This is the moment in
Task's lifecycle (and its corresponding RDD) when a RDD partition is computed and in turn becomes a sequence of records (i.e. real data) on an executor.
In case of any exceptions,
runTask requests the
ShuffleWriter to stop (with the
success flag off) and (re)throws the exception.
runTask may also print out the following DEBUG message to the logs when the
ShuffleWriter could not be stopped.
Could not stop writer
ALL logging level for
org.apache.spark.scheduler.ShuffleMapTask logger to see what happens inside.
Add the following line to
logger.ShuffleMapTask.name = org.apache.spark.scheduler.ShuffleMapTask
logger.ShuffleMapTask.level = all
Refer to Logging.