BlockStoreUpdater¶
BlockStoreUpdater is an abstraction of block store updaters that store blocks (from bytes, whether they start in memory or on disk).
BlockStoreUpdater is an internal class of BlockManager.
Contract¶
Block Data¶
blockData(): BlockData
Used when:
BlockStoreUpdateris requested to saveTempFileBasedBlockStoreUpdateris requested to readToByteBuffer
readToByteBuffer¶
readToByteBuffer(): ChunkedByteBuffer
Used when:
BlockStoreUpdateris requested to save
Storing Block to Disk¶
saveToDiskStore(): Unit
Used when:
BlockStoreUpdateris requested to save
Implementations¶
Creating Instance¶
BlockStoreUpdater takes the following to be created:
- Block Size
- BlockId
- StorageLevel
- Scala's
ClassTag -
tellMasterflag -
keepReadLockflag
Abstract Class
BlockStoreUpdater is an abstract class and cannot be created directly. It is created indirectly for the concrete BlockStoreUpdaters.
Saving Block to Block Store¶
save(): Boolean
save doPut with the putBody function.
save is used when:
BlockManageris requested to putBlockDataAsStream and store block bytes locally
putBody Function¶
With the StorageLevel with replication (above 1), the putBody function triggers replication concurrently (using a Future (Scala) on a separate thread from the ExecutionContextExecutorService).
In general, putBody stores the block in the MemoryStore first (if requested based on useMemory of the StorageLevel). putBody saves to a DiskStore (if useMemory is not specified or storing to the MemoryStore failed).
Note
putBody stores the block in the MemoryStore only even if the useMemory and useDisk flags could both be turned on (true).
Spark drops the block to disk later if the memory store can't hold it.
With the useMemory of the StorageLevel set, putBody saveDeserializedValuesToMemoryStore for deserialized storage level or saveSerializedValuesToMemoryStore otherwise.
putBody saves to a DiskStore when either of the following happens:
- Storing in memory fails and the useDisk (of the StorageLevel) is set
- useMemory of the StorageLevel is not set yet the useDisk is
putBody getCurrentBlockStatus and checks if it is in either the memory or disk store.
In the end, putBody reportBlockStatus (if the given tellMaster flag and the tellMaster flag of the BlockInfo are both enabled) and addUpdatedBlockStatusToTaskMetrics.
putBody prints out the following DEBUG message to the logs:
Put block [blockId] locally took [timeUsed] ms
putBody prints out the following WARN message to the logs when an attempt to store a block in memory fails and the useDisk is set:
Persisting block [blockId] to disk instead.
Saving Deserialized Values to MemoryStore¶
saveDeserializedValuesToMemoryStore(
inputStream: InputStream): Boolean
saveDeserializedValuesToMemoryStore...FIXME
saveDeserializedValuesToMemoryStore is used when:
BlockStoreUpdateris requested to save a block (with memory deserialized storage level)
Saving Serialized Values to MemoryStore¶
saveSerializedValuesToMemoryStore(
bytes: ChunkedByteBuffer): Boolean
saveSerializedValuesToMemoryStore...FIXME
saveSerializedValuesToMemoryStore is used when:
BlockStoreUpdateris requested to save a block (with memory serialized storage level)
Logging¶
BlockStoreUpdater is an abstract class and logging is configured using the logger of the implementations.