Skip to content

StorageLevel

StorageLevel is the following flags for controlling the storage of an RDD.

Flag Default Value
useDisk false
useMemory true
useOffHeap false
deserialized false
replication 1

Restrictions

  1. The replication is restricted to be less than 40 (for calculating the hash code)
  2. Off-heap storage level does not support deserialized storage

Validation

isValid: Boolean

StorageLevel is considered valid when the following all hold:

  1. Uses memory or disk
  2. Replication is non-zero positive number (between the default 1 and 40)

Externalizable

DirectTaskResult is an Externalizable (Java).

writeExternal

writeExternal(
  out: ObjectOutput): Unit

writeExternal is part of the Externalizable (Java) abstraction.

writeExternal writes the bitwise representation out followed by the replication of this StorageLevel.

Bitwise Integer Representation

toInt: Int

toInt converts this StorageLevel to numeric (binary) representation by turning the corresponding bits on for the following (if used and in that order):

  1. deserialized
  2. useOffHeap
  3. useMemory
  4. useDisk

In other words, the following number in bitwise representation says the StorageLevel is deserialized and useMemory:

import org.apache.spark.storage.StorageLevel.MEMORY_ONLY
assert(MEMORY_ONLY.toInt == (0 | 1 | 4))

scala> println(MEMORY_ONLY.toInt.toBinaryString)
101

toInt is used when: