TaskResults — DirectTaskResult and IndirectTaskResult

TaskResult models a task result. It has exactly two concrete implementations:

  1. DirectTaskResult is the TaskResult to be serialized and sent over the wire to the driver together with the result bytes and accumulators.

  2. IndirectTaskResult is the TaskResult that is just a pointer to a task result in a BlockManager.

The decision of the concrete TaskResult is made when a TaskRunner finishes running a task and checks the size of the result.

The types are private[spark].

DirectTaskResult Task Result

  var valueBytes: ByteBuffer,
  var accumUpdates: Seq[AccumulatorV2[_, _]])
extends TaskResult[T] with Externalizable

DirectTaskResult is the TaskResult of running a task (that is later returned serialized to the driver) when the size of the task’s result is smaller than spark.driver.maxResultSize and spark.task.maxDirectResultSize (or spark.rpc.message.maxSize whatever is smaller).

DirectTaskResult is Java’s java.io.Externalizable.

IndirectTaskResult Task Result

IndirectTaskResult[T](blockId: BlockId, size: Int)
extends TaskResult[T] with Serializable

IndirectTaskResult is a TaskResult that…​

IndirectTaskResult is Java’s java.io.Serializable.