Skip to content

DiskStore

DiskStore manages data blocks on disk for BlockManager.

DiskStore and BlockManager

Creating Instance

DiskStore takes the following to be created:

DiskStore is created for BlockManager.

Block Sizes

blockSizes: ConcurrentHashMap[BlockId, Long]

DiskStore uses ConcurrentHashMap (Java) as a registry of blocks and the data size (on disk).

A new entry is added when put and moveFileToBlock.

An entry is removed when remove.

putBytes

putBytes(
  blockId: BlockId,
  bytes: ChunkedByteBuffer): Unit

putBytes put the block and writes the buffer out (to the given channel).

putBytes is used when:

getBytes

getBytes(
  blockId: BlockId): BlockData
getBytes(
  f: File,
  blockSize: Long): BlockData

getBytes requests the DiskBlockManager for the block file and the size.

getBytes requests the SecurityManager for getIOEncryptionKey and returns a EncryptedBlockData if available or a DiskBlockData otherwise.

getBytes is used when:

getSize

getSize(
  blockId: BlockId): Long

getSize looks up the block in the blockSizes registry.

getSize is used when:

moveFileToBlock

moveFileToBlock(
  sourceFile: File,
  blockSize: Long,
  targetBlockId: BlockId): Unit

moveFileToBlock...FIXME

moveFileToBlock is used when:

Checking if Block File Exists

contains(
  blockId: BlockId): Boolean

contains requests the DiskBlockManager for the block file and checks whether the file actually exists or not.

contains is used when:

Persisting Block to Disk

put(
  blockId: BlockId)(
  writeFunc: WritableByteChannel => Unit): Unit

put prints out the following DEBUG message to the logs:

Attempting to put block [blockId]

put requests the DiskBlockManager for the block file for the input BlockId.

put opens the block file for writing (wrapped into a CountingWritableChannel to count the bytes written). put executes the given writeFunc function (with the WritableByteChannel of the block file) and saves the bytes written (to the blockSizes registry).

In the end, put prints out the following DEBUG message to the logs:

Block [fileName] stored as [size] file on disk in [time] ms

In case of any exception, put deletes the block file.

put throws an IllegalStateException when the block is already stored:

Block [blockId] is already present in the disk store

put is used when:

Removing Block

remove(
  blockId: BlockId): Boolean

remove...FIXME

remove is used when:

  • BlockManager is requested to removeBlockInternal
  • DiskStore is requested to put (and an IOException is thrown)

Logging

Enable ALL logging level for org.apache.spark.storage.DiskStore logger to see what happens inside.

Add the following line to conf/log4j.properties:

log4j.logger.org.apache.spark.storage.DiskStore=ALL

Refer to Logging.