Skip to content

ResourceProfile

ResourceProfile is a resource profile (with executor and task requirements) for Stage Level Scheduling.

ResourceProfile is associated with an RDD using withResources operator.

Creating Instance

ResourceProfile takes the following to be created:

  • Executor Resources (Map[String, ExecutorResourceRequest])
  • Task Resources (Map[String, TaskResourceRequest])

ResourceProfile is created (directly or using getOrCreateDefaultProfile) when:

Serializable

ResourceProfile is a Java Serializable.

Default Profile

ResourceProfile (object) defines defaultProfile internal registry with the default ResourceProfile (per JVM instance).

defaultProfile is None (undefined) by default and gets a new ResourceProfile in getOrCreateDefaultProfile.

defaultProfile is available using getOrCreateDefaultProfile.

defaultProfile is cleared (removed) in clearDefaultProfile.

getOrCreateDefaultProfile

getOrCreateDefaultProfile(
  conf: SparkConf): ResourceProfile

getOrCreateDefaultProfile returns the default profile (if defined) or creates a new one.

If undefined, getOrCreateDefaultProfile creates a ResourceProfile with the default task and executor resources and makes it the defaultProfile.

getOrCreateDefaultProfile prints out the following INFO message to the logs:

Default ResourceProfile created,
executor resources: [executorResources], task resources: [taskResources]

getOrCreateDefaultProfile is used when:

Default Executor Resources

getDefaultExecutorResources(
  conf: SparkConf): Map[String, ExecutorResourceRequest]

getDefaultExecutorResources creates an ExecutorResourceRequests with the following:

Property Configuration Property
cores spark.executor.cores
memory spark.executor.memory
memoryOverhead spark.executor.memoryOverhead
pysparkMemory spark.executor.pyspark.memory
offHeapMemory spark.memory.offHeap.size

getDefaultExecutorResources finds executor resource requests (with the spark.executor component name in the given SparkConf) for ExecutorResourceRequests.

getDefaultExecutorResources initializes the defaultProfileExecutorResources (with the executor resource requests).

In the end, getDefaultExecutorResources requests the ExecutorResourceRequests for all the resource requests

getResourcesForClusterManager

getResourcesForClusterManager(
  rpId: Int,
  execResources: Map[String, ExecutorResourceRequest],
  overheadFactor: Double,
  conf: SparkConf,
  isPythonApp: Boolean,
  resourceMappings: Map[String, String]): ExecutorResourcesOrDefaults

getResourcesForClusterManager takes the DefaultProfileExecutorResources.

getResourcesForClusterManager calculates the overhead memory with the following:

  • memoryOverheadMiB and executorMemoryMiB of the DefaultProfileExecutorResources
  • Given overheadFactor

If the given rpId resource profile ID is not the default ID (0), getResourcesForClusterManager...FIXME (there is so much to "digest")

getResourcesForClusterManager...FIXME

In the end, getResourcesForClusterManager creates a ExecutorResourcesOrDefaults.

getResourcesForClusterManager is used when:

  • BasicExecutorFeatureStep (Spark on Kubernetes) is created
  • YarnAllocator (Spark on YARN) is requested to createYarnResourceForResourceProfile