DiskBlockManager creates and maintains the logical mapping between logical blocks and physical on-disk locations.
By default, one block is mapped to one file with a name given by its
BlockId. It is however possible to have a block map to only a segment of a file.
Block files are hashed among the local directories.
Add the following line to
Refer to Logging.
getFile(blockId: BlockId): File (1) getFile(filename: String): File
createTempShuffleBlock(): (TempShuffleBlockId, File)
createTempShuffleBlock creates a temporary
DiskBlockManager(conf: SparkConf, deleteFilesOnStop: Boolean)
DiskBlockManager uses spark.diskStore.subDirectories configuration property to set
DiskBlockManager creates one or many local directories to store block data (as localDirs). When not successful, you should see the following ERROR message in the logs and
DiskBlockManager exits with error code
ERROR DiskBlockManager: Failed to create any local dir.
DiskBlockManager initializes the internal subDirs collection of locks for every local directory to store block data with an array of
subDirsPerLocalDir size for files.
In the end,
DiskBlockManager registers a shutdown hook to clean up the local directories for blocks.
addShutdownHook registers a shutdown hook to execute doStop at shutdown.
When executed, you should see the following DEBUG message in the logs:
DEBUG DiskBlockManager: Adding shutdown hook
addShutdownHook adds the shutdown hook so it prints the following INFO message and executes doStop.
INFO DiskBlockManager: Shutdown hook called
doStop deletes the local directories recursively (only when the constructor’s
deleteFilesOnStop is enabled and the parent directories are not registered to be removed at shutdown).
getConfiguredLocalDirs(conf: SparkConf): Array[String]
getConfiguredLocalDirs returns the local directories where Spark can write files.
In non-YARN mode (or for the driver in yarn-client mode),
getConfiguredLocalDirs checks the following environment variables (in the order) and returns the value of the first met:
MESOS_DIRECTORYenvironment variable (only when External Shuffle Service is not used)
In the end, when no earlier environment variables were found,
getConfiguredLocalDirs uses spark.local.dir Spark property or falls back on
java.io.tmpdir System property.
getYarnLocalDirs(conf: SparkConf): String
conf SparkConf to read
LOCAL_DIRS environment variable with comma-separated local directories (that have already been created and secured so that only the user has access to them).
getYarnLocalDirs throws an
Exception with the message
Yarn Local dirs can’t be empty if
LOCAL_DIRS environment variable was not set.
isRunningInYarnContainer(conf: SparkConf): Boolean
getAllBlocks gets all the blocks stored on disk.
getAllBlocks takes the block files and returns their names (as
createLocalDirs(conf: SparkConf): Array[File]
blockmgr-[random UUID] directory under local directories to store block data.
createLocalDirs reads local writable directories and creates a subdirectory
blockmgr-[random UUID] under every configured parent directory.
If successful, you should see the following INFO message in the logs:
INFO DiskBlockManager: Created local directory at [localDir]
When failed to create a local directory, you should see the following ERROR message in the logs:
ERROR DiskBlockManager: Failed to create local dir in [rootDir]. Ignoring this directory.
subDirs is a collection of subDirsPerLocalDir file locks for every local block store directory where
DiskBlockManager stores block data (with the columns being the number of local directories and the rows as collection of
There has to be at least one local directory or
spark.diskStore.subDirectories configuration property (default: