DiskBlockManager creates and maintains the logical mapping between logical blocks and physical on-disk locations for a BlockManager.
By default, one block is mapped to one file with a name given by its
BlockId. It is however possible to have a block map to only a segment of a file.
Block files are hashed among the local directories.
DiskBlockManager is used to create a DiskStore.
DiskBlockManager takes the following to be created:
When created, DiskBlockManager creates one or many local directories to store block data and initializes the internal subDirs collection of locks for every local directory.
In the end, DiskBlockManager registers a shutdown hook to clean up the local directories for blocks.
While being created, DiskBlockManager creates local directories for block data. DiskBlockManager expects at least one local directory or prints out the following ERROR message to the logs and exits the JVM (with exit code 53).
Failed to create any local dir.
localDirs is used when:
subDirs is a lookup table for file locks of every local block directory (with the first dimension for local directories and the second for locks).
The number of block subdirectories is controlled by spark.diskStore.subDirectories configuration property (default:
subDirs(dirId)(subDirId) is used to access
subDirId subdirectory in
dirId local directory.
createLocalDirs( conf: SparkConf): Array[File]
blockmgr-[random UUID] directory under local directories to store block data.
Internally, createLocalDirs finds the configured local directories where Spark can write files and creates a subdirectory
blockmgr-[UUID] under every configured parent directory.
For every local directory, createLocalDirs prints out the following INFO message to the logs:
Created local directory at [localDir]
In case of an exception, createLocalDirs prints out the following ERROR message to the logs and skips the directory.
Failed to create local dir in [rootDir]. Ignoring this directory.
createLocalDirs is used when the localDirs internal registry is initialized.
getFile( blockId: BlockId): File (1) getFile( filename: String): File
|1||Uses the name of the given
getFile computes a hash of the file name of the input BlockId that is used for the name of the parent directory and subdirectory.
getFile creates the subdirectory unless it already exists.
getFile is used when:
createTempShuffleBlock(): (TempShuffleBlockId, File)
createTempShuffleBlock creates a temporary
addShutdownHook registers a shutdown hook to execute doStop at shutdown.
When executed, you should see the following DEBUG message in the logs:
DEBUG DiskBlockManager: Adding shutdown hook
addShutdownHook adds the shutdown hook so it prints the following INFO message and executes doStop.
INFO DiskBlockManager: Shutdown hook called
getConfiguredLocalDirs(conf: SparkConf): Array[String]
getConfiguredLocalDirs returns the local directories where Spark can write files.
In non-YARN mode (or for the driver in yarn-client mode),
getConfiguredLocalDirs checks the following environment variables (in the order) and returns the value of the first met:
MESOS_DIRECTORYenvironment variable (only when External Shuffle Service is not used)
In the end, when no earlier environment variables were found,
getConfiguredLocalDirs uses spark.local.dir Spark property or falls back on
java.io.tmpdir System property.
getYarnLocalDirs(conf: SparkConf): String
conf SparkConf to read
LOCAL_DIRS environment variable with comma-separated local directories (that have already been created and secured so that only the user has access to them).
getYarnLocalDirs throws an
Exception with the message
Yarn Local dirs can’t be empty if
LOCAL_DIRS environment variable was not set.
ALL logging level for
org.apache.spark.storage.DiskBlockManager logger to see what happens inside.
Add the following line to
Refer to Logging.