Skip to content

BlockStoreUpdater

BlockStoreUpdater is an abstraction of block store updaters that store blocks (from bytes, whether they start in memory or on disk).

BlockStoreUpdater is an internal class of BlockManager.

Contract

Block Data

blockData(): BlockData

BlockData

Used when:

  • BlockStoreUpdater is requested to save
  • TempFileBasedBlockStoreUpdater is requested to readToByteBuffer

readToByteBuffer

readToByteBuffer(): ChunkedByteBuffer

Used when:

  • BlockStoreUpdater is requested to save

Storing Block to Disk

saveToDiskStore(): Unit

Used when:

  • BlockStoreUpdater is requested to save

Implementations

Creating Instance

BlockStoreUpdater takes the following to be created:

Abstract Class

BlockStoreUpdater is an abstract class and cannot be created directly. It is created indirectly for the concrete BlockStoreUpdaters.

Saving Block to Block Store

save(): Boolean

save doPut with the putBody function.

save is used when:

putBody Function

With the StorageLevel with replication (above 1), the putBody function triggers replication concurrently (using a Future (Scala) on a separate thread from the ExecutionContextExecutorService).

In general, putBody stores the block in the MemoryStore first (if requested based on useMemory of the StorageLevel). putBody saves to a DiskStore (if useMemory is not specified or storing to the MemoryStore failed).

Note

putBody stores the block in the MemoryStore only even if the useMemory and useDisk flags could both be turned on (true).

Spark drops the block to disk later if the memory store can't hold it.

With the useMemory of the StorageLevel set, putBody saveDeserializedValuesToMemoryStore for deserialized storage level or saveSerializedValuesToMemoryStore otherwise.

putBody saves to a DiskStore when either of the following happens:

  1. Storing in memory fails and the useDisk (of the StorageLevel) is set
  2. useMemory of the StorageLevel is not set yet the useDisk is

putBody getCurrentBlockStatus and checks if it is in either the memory or disk store.

In the end, putBody reportBlockStatus (if the given tellMaster flag and the tellMaster flag of the BlockInfo are both enabled) and addUpdatedBlockStatusToTaskMetrics.

putBody prints out the following DEBUG message to the logs:

Put block [blockId] locally took [timeUsed] ms

putBody prints out the following WARN message to the logs when an attempt to store a block in memory fails and the useDisk is set:

Persisting block [blockId] to disk instead.

Saving Deserialized Values to MemoryStore

saveDeserializedValuesToMemoryStore(
  inputStream: InputStream): Boolean

saveDeserializedValuesToMemoryStore...FIXME

saveDeserializedValuesToMemoryStore is used when:

Saving Serialized Values to MemoryStore

saveSerializedValuesToMemoryStore(
  bytes: ChunkedByteBuffer): Boolean

saveSerializedValuesToMemoryStore...FIXME

saveSerializedValuesToMemoryStore is used when:

Logging

BlockStoreUpdater is an abstract class and logging is configured using the logger of the implementations.