SerializerManager¶
SerializerManager
is used to select the Serializer for shuffle blocks.
Creating Instance¶
SerializerManager
takes the following to be created:
- Default Serializer
- SparkConf
- (optional) Encryption Key (
Option[Array[Byte]]
)
SerializerManager
is created when:
SparkEnv
utility is used to create a SparkEnv (for the driver and executors)
Kryo-Compatible Types¶
Kryo-Compatible Types are the following primitive types, Array
s of the primitive types and String
s:
Boolean
Byte
Char
Double
Float
Int
Long
Null
Short
Default Serializer¶
SerializerManager
is given a Serializer when created (based on spark.serializer configuration property).
The Serializer
is used when SerializerManager
is requested for a Serializer.
Tip
Enable DEBUG
logging level of SparkEnv to be told about the selected Serializer
.
Using serializer: [serializer]
Accessing SerializerManager¶
SerializerManager
is available using SparkEnv on the driver and executors.
import org.apache.spark.SparkEnv
SparkEnv.get.serializerManager
KryoSerializer¶
SerializerManager
creates a KryoSerializer when created.
KryoSerializer
is used as the serializer when the types of a given key and value are Kryo-compatible.
Selecting Serializer¶
getSerializer(
ct: ClassTag[_],
autoPick: Boolean): Serializer
getSerializer(
keyClassTag: ClassTag[_],
valueClassTag: ClassTag[_]): Serializer
getSerializer
returns the KryoSerializer when the given ClassTag
s are Kryo-compatible and the autoPick
flag is true
. Otherwise, getSerializer
returns the default Serializer.
autoPick
flag is true
for all BlockIds but Spark Streaming's StreamBlockId
s.
getSerializer
(with autoPick
flag) is used when:
SerializerManager
is requested to dataSerializeStream, dataSerializeWithExplicitClassTag and dataDeserializeStreamSerializedValuesHolder
(of MemoryStore) is requested for aSerializationStream
getSerializer
(with key and value ClassTag
s only) is used when:
ShuffledRDD
is requested for dependencies
dataSerializeStream¶
dataSerializeStream[T: ClassTag](
blockId: BlockId,
outputStream: OutputStream,
values: Iterator[T]): Unit
dataSerializeStream
...FIXME
dataSerializeStream
is used when:
BlockManager
is requested to doPutIterator and dropFromMemory
dataSerializeWithExplicitClassTag¶
dataSerializeWithExplicitClassTag(
blockId: BlockId,
values: Iterator[_],
classTag: ClassTag[_]): ChunkedByteBuffer
dataSerializeWithExplicitClassTag
...FIXME
dataSerializeWithExplicitClassTag
is used when:
BlockManager
is requested to doGetLocalBytesSerializerManager
is requested to dataSerialize
dataDeserializeStream¶
dataDeserializeStream[T](
blockId: BlockId,
inputStream: InputStream)
(classTag: ClassTag[T]): Iterator[T]
dataDeserializeStream
...FIXME
dataDeserializeStream
is used when:
BlockStoreUpdater
is requested to saveDeserializedValuesToMemoryStoreBlockManager
is requested to getLocalValues and getRemoteValuesMemoryStore
is requested to putIteratorAsBytes (whenPartiallySerializedBlock
is requested for aPartiallyUnrolledIterator
)