StorageLevel¶
StorageLevel
is the following flags for controlling the storage of an RDD.
Flag | Default Value |
---|---|
useDisk | false |
useMemory | true |
useOffHeap | false |
deserialized | false |
replication | 1 |
Restrictions¶
- The replication is restricted to be less than
40
(for calculating the hash code) - Off-heap storage level does not support deserialized storage
Validation¶
isValid: Boolean
StorageLevel
is considered valid when the following all hold:
- Uses memory or disk
- Replication is non-zero positive number (between the default
1
and 40)
Externalizable¶
DirectTaskResult
is an Externalizable
(Java).
writeExternal¶
writeExternal(
out: ObjectOutput): Unit
writeExternal
is part of the Externalizable
(Java) abstraction.
writeExternal
writes the bitwise representation out followed by the replication of this StorageLevel
.
Bitwise Integer Representation¶
toInt: Int
toInt
converts this StorageLevel
to numeric (binary) representation by turning the corresponding bits on for the following (if used and in that order):
In other words, the following number in bitwise representation says the StorageLevel
is deserialized and useMemory:
import org.apache.spark.storage.StorageLevel.MEMORY_ONLY
assert(MEMORY_ONLY.toInt == (0 | 1 | 4))
scala> println(MEMORY_ONLY.toInt.toBinaryString)
101
toInt
is used when:
StorageLevel
is requested to writeExternal and hashCode