DiskBlockObjectWriter

Whenever DiskBlockObjectWriter is requested to write a key-value pair, it makes sure that the underlying output streams are open.

DiskBlockObjectWriter can be in the following states (that match the state of the underlying output streams):

  1. Initialized

  2. Open

  3. Closed

Table 1. DiskBlockObjectWriter’s Internal Registries and Counters
Name Description

initialized

Internal flag…​FIXME

Used when…​FIXME

hasBeenClosed

Internal flag…​FIXME

Used when…​FIXME

streamOpen

Internal flag…​FIXME

Used when…​FIXME

objOut

FIXME

Used when…​FIXME

mcs

FIXME

Used when…​FIXME

bs

FIXME

Used when…​FIXME

blockId

FIXME

Used when…​FIXME

DiskBlockObjectWriter is a private[spark] class.

updateBytesWritten Method

FIXME

initialize Method

FIXME

Writing Bytes (From Byte Array Starting From Offset) — write Method

write(kvBytes: Array[Byte], offs: Int, len: Int): Unit

write…​FIXME

FIXME

recordWritten Method

FIXME

commitAndGet Method

commitAndGet(): FileSegment
commitAndGet is used when…​FIXME

close Method

FIXME

Creating DiskBlockObjectWriter Instance

DiskBlockObjectWriter takes the following when created:

  1. file

  2. serializerManager — SerializerManager

  3. serializerInstance — SerializerInstance

  4. bufferSize

  5. syncWrites flag

  6. writeMetrics — ShuffleWriteMetrics

  7. blockId — BlockId

DiskBlockObjectWriter initializes the internal registries and counters.

Writing Key-Value Pair — write Method

write(key: Any, value: Any): Unit

Before writing, write opens the stream unless already open.

write then writes the key first followed by writing the value.

In the end, write recordWritten.

write is used when BypassMergeSortShuffleWriter writes records and in ExternalAppendOnlyMap, ExternalSorter and WritablePartitionedPairCollection.

Opening DiskBlockObjectWriter — open Method

open(): DiskBlockObjectWriter

open opens DiskBlockObjectWriter, i.e. initializes and re-sets bs and objOut internal output streams.

Internally, open makes sure that DiskBlockObjectWriter is not closed (i.e. hasBeenClosed flag is disabled). If it was, open throws a IllegalStateException:

Writer already closed. Cannot be reopened.

Unless DiskBlockObjectWriter has already been initialized (i.e. initialized flag is enabled), open initializes it (and turns initialized flag on).

Regardless of whether DiskBlockObjectWriter was already initialized or not, open requests SerializerManager to wrap mcs output stream for encryption and compression (for blockId) and sets it as bs.

open uses SerializerManager that was specified when DiskBlockObjectWriter was created
open uses SerializerInstance that was specified when DiskBlockObjectWriter was created

In the end, open turns streamOpen flag on.

open is used exclusively when DiskBlockObjectWriter writes a key-value pair or bytes from a specified byte array but the stream is not open yet.