StorageLevel¶

StorageLevel is the following flags for controlling the storage of an RDD.

Flag	Default Value
`useDisk`	`false`
`useMemory`	`true`
`useOffHeap`	`false`
`deserialized`	`false`
`replication`	1

Restrictions¶

The replication is restricted to be less than 40 (for calculating the hash code)
Off-heap storage level does not support deserialized storage

Validation¶

isValid: Boolean

StorageLevel is considered valid when the following all hold:

Uses memory or disk
Replication is non-zero positive number (between the default 1 and 40)

Externalizable¶

DirectTaskResult is an Externalizable (Java).

writeExternal¶

writeExternal(
  out: ObjectOutput): Unit

writeExternal is part of the Externalizable (Java) abstraction.

writeExternal writes the bitwise representation out followed by the replication of this StorageLevel.

Bitwise Integer Representation¶

toInt: Int

toInt converts this StorageLevel to numeric (binary) representation by turning the corresponding bits on for the following (if used and in that order):

deserialized
useOffHeap
useMemory
useDisk

In other words, the following number in bitwise representation says the StorageLevel is deserialized and useMemory:

import org.apache.spark.storage.StorageLevel.MEMORY_ONLY
assert(MEMORY_ONLY.toInt == (0 | 1 | 4))

scala> println(MEMORY_ONLY.toInt.toBinaryString)
101

toInt is used when:

StorageLevel is requested to writeExternal and hashCode