Skip to content

StorageLevel

= StorageLevel

StorageLevel describes how an RDD is persisted (and addresses the following concerns):

  • Does RDD use disk?
  • Does RDD <> to store data?
  • How much of RDD is in memory?
  • Does RDD use off-heap memory?
  • Should an RDD be serialized or <> (while storing the data)?
  • How many replicas (default: 1) to use (can only be less than 40)?

There are the following StorageLevel (number _2 in the name denotes 2 replicas):

  • [[NONE]] NONE (default)
  • DISK_ONLY
  • DISK_ONLY_2
  • [[MEMORY_ONLY]] MEMORY_ONLY (default for spark-rdd-caching.md#cache[cache operation] for RDDs)
  • MEMORY_ONLY_2
  • MEMORY_ONLY_SER
  • MEMORY_ONLY_SER_2
  • [[MEMORY_AND_DISK]] MEMORY_AND_DISK
  • MEMORY_AND_DISK_2
  • MEMORY_AND_DISK_SER
  • MEMORY_AND_DISK_SER_2
  • OFF_HEAP

You can check out the storage level using getStorageLevel() operation.

val lines = sc.textFile("README.md")

scala> lines.getStorageLevel
res0: org.apache.spark.storage.StorageLevel = StorageLevel(disk=false, memory=false, offheap=false, deserialized=false, replication=1)

[[useMemory]] StorageLevel can indicate to use memory for data storage using useMemory flag.

[source, scala]

useMemory: Boolean

[[useDisk]] StorageLevel can indicate to use disk for data storage using useDisk flag.

[source, scala]

useDisk: Boolean

[[deserialized]] StorageLevel can indicate to store data in deserialized format using deserialized flag.

[source, scala]

deserialized: Boolean

[[replication]] StorageLevel can indicate to replicate the data to other block managers using replication property.

[source, scala]

replication: Int


Last update: 2020-10-06