CompressionCodec

With spark.broadcast.compress enabled (which is the default), TorrentBroadcast uses compression for broadcast blocks.

FIXME What’s compressed?
Table 1. Built-in Compression Codecs
Codec Alias Fully-Qualified Class Name Notes

lz4

org.apache.spark.io.LZ4CompressionCodec

The default implementation

lzf

org.apache.spark.io.LZFCompressionCodec

snappy

org.apache.spark.io.SnappyCompressionCodec

The fallback when the default codec is not available.

An implementation of CompressionCodec trait has to offer a constructor that accepts a single argument being SparkConf. Read Creating CompressionCodec — createCodec Factory Method in this document.

You can control the default compression codec in a Spark application using spark.io.compression.codec Spark property.

Creating CompressionCodec — createCodec Factory Method

createCodec(conf: SparkConf): CompressionCodec  (1)
createCodec(conf: SparkConf, codecName: String): CompressionCodec (2)

createCodec uses the internal shortCompressionCodecNames lookup table to find the input codecName (regardless of the case).

createCodec finds the constructor of the compression codec’s implementation (that accepts a single argument being SparkConf).

If a compression codec could not be found, createCodec throws a IllegalArgumentException exception:

Codec [<codecName>] is not available. Consider setting spark.io.compression.codec=snappy

getCodecName Method

getCodecName(conf: SparkConf): String

getCodecName reads spark.io.compression.codec Spark property from the input conf SparkConf or assumes lz4.

Settings

Table 2. Settings
Name Default value Description

spark.io.compression.codec

lz4

The compression codec to use.

Used when getCodecName is called to find the current compression codec.