StorageLevel¶
StorageLevel is the following flags for controlling the storage of an RDD.
| Flag | Default Value |
|---|---|
useDisk | false |
useMemory | true |
useOffHeap | false |
deserialized | false |
replication | 1 |
Restrictions¶
- The replication is restricted to be less than
40(for calculating the hash code) - Off-heap storage level does not support deserialized storage
Validation¶
isValid: Boolean
StorageLevel is considered valid when the following all hold:
- Uses memory or disk
- Replication is non-zero positive number (between the default
1and 40)
Externalizable¶
DirectTaskResult is an Externalizable (Java).
writeExternal¶
writeExternal(
out: ObjectOutput): Unit
writeExternal is part of the Externalizable (Java) abstraction.
writeExternal writes the bitwise representation out followed by the replication of this StorageLevel.
Bitwise Integer Representation¶
toInt: Int
toInt converts this StorageLevel to numeric (binary) representation by turning the corresponding bits on for the following (if used and in that order):
In other words, the following number in bitwise representation says the StorageLevel is deserialized and useMemory:
import org.apache.spark.storage.StorageLevel.MEMORY_ONLY
assert(MEMORY_ONLY.toInt == (0 | 1 | 4))
scala> println(MEMORY_ONLY.toInt.toBinaryString)
101
toInt is used when:
StorageLevelis requested to writeExternal and hashCode