StorageLevel¶
StorageLevel is the following flags for controlling the storage of an RDD.
| Flag | Default Value |
|---|---|
useDisk | false |
useMemory | true |
useOffHeap | false |
deserialized | false |
replication | 1 |
Restrictions¶
- The replication is restricted to be less than
40(for calculating the hash code) - Off-heap storage level does not support deserialized storage
Validation¶
StorageLevel is considered valid when the following all hold:
- Uses memory or disk
- Replication is non-zero positive number (between the default
1and 40)
Externalizable¶
DirectTaskResult is an Externalizable (Java).
writeExternal¶
writeExternal is part of the Externalizable (Java) abstraction.
writeExternal writes the bitwise representation out followed by the replication of this StorageLevel.
Bitwise Integer Representation¶
toInt converts this StorageLevel to numeric (binary) representation by turning the corresponding bits on for the following (if used and in that order):
In other words, the following number in bitwise representation says the StorageLevel is deserialized and useMemory:
import org.apache.spark.storage.StorageLevel.MEMORY_ONLY
assert(MEMORY_ONLY.toInt == (0 | 1 | 4))
scala> println(MEMORY_ONLY.toInt.toBinaryString)
101
toInt is used when:
StorageLevelis requested to writeExternal and hashCode