BlockStoreUpdater¶
BlockStoreUpdater
is an abstraction of block store updaters that store blocks (from bytes, whether they start in memory or on disk).
BlockStoreUpdater
is an internal class of BlockManager.
Contract¶
Block Data¶
blockData(): BlockData
Used when:
BlockStoreUpdater
is requested to saveTempFileBasedBlockStoreUpdater
is requested to readToByteBuffer
readToByteBuffer¶
readToByteBuffer(): ChunkedByteBuffer
Used when:
BlockStoreUpdater
is requested to save
Storing Block to Disk¶
saveToDiskStore(): Unit
Used when:
BlockStoreUpdater
is requested to save
Implementations¶
Creating Instance¶
BlockStoreUpdater
takes the following to be created:
- Block Size
- BlockId
- StorageLevel
- Scala's
ClassTag
-
tellMaster
flag -
keepReadLock
flag
Abstract Class
BlockStoreUpdater
is an abstract class and cannot be created directly. It is created indirectly for the concrete BlockStoreUpdaters.
Saving Block to Block Store¶
save(): Boolean
save
doPut with the putBody function.
save
is used when:
BlockManager
is requested to putBlockDataAsStream and store block bytes locally
putBody Function¶
With the StorageLevel with replication (above 1
), the putBody
function triggers replication concurrently (using a Future
(Scala) on a separate thread from the ExecutionContextExecutorService).
In general, putBody
stores the block in the MemoryStore first (if requested based on useMemory of the StorageLevel). putBody
saves to a DiskStore (if useMemory is not specified or storing to the MemoryStore
failed).
Note
putBody
stores the block in the MemoryStore
only even if the useMemory and useDisk flags could both be turned on (true
).
Spark drops the block to disk later if the memory store can't hold it.
With the useMemory of the StorageLevel set, putBody
saveDeserializedValuesToMemoryStore for deserialized storage level or saveSerializedValuesToMemoryStore otherwise.
putBody
saves to a DiskStore when either of the following happens:
- Storing in memory fails and the useDisk (of the StorageLevel) is set
- useMemory of the StorageLevel is not set yet the useDisk is
putBody
getCurrentBlockStatus and checks if it is in either the memory or disk store.
In the end, putBody
reportBlockStatus (if the given tellMaster flag and the tellMaster flag of the BlockInfo
are both enabled) and addUpdatedBlockStatusToTaskMetrics.
putBody
prints out the following DEBUG message to the logs:
Put block [blockId] locally took [timeUsed] ms
putBody
prints out the following WARN message to the logs when an attempt to store a block in memory fails and the useDisk is set:
Persisting block [blockId] to disk instead.
Saving Deserialized Values to MemoryStore¶
saveDeserializedValuesToMemoryStore(
inputStream: InputStream): Boolean
saveDeserializedValuesToMemoryStore
...FIXME
saveDeserializedValuesToMemoryStore
is used when:
BlockStoreUpdater
is requested to save a block (with memory deserialized storage level)
Saving Serialized Values to MemoryStore¶
saveSerializedValuesToMemoryStore(
bytes: ChunkedByteBuffer): Boolean
saveSerializedValuesToMemoryStore
...FIXME
saveSerializedValuesToMemoryStore
is used when:
BlockStoreUpdater
is requested to save a block (with memory serialized storage level)
Logging¶
BlockStoreUpdater
is an abstract class and logging is configured using the logger of the implementations.