CompressionCodecs¶
CompressionCodecs
utility is used to set Hadoop compression-related configuration properties for CSV
, JSON
and Text
file formats are requested to prepare write.
Compression Codecs¶
Alias | Class Name |
---|---|
none | |
uncompressed | |
bzip2 | org.apache.hadoop.io.compress.BZip2Codec |
deflate | org.apache.hadoop.io.compress.DeflateCodec |
gzip | org.apache.hadoop.io.compress.GzipCodec |
lz4 | org.apache.hadoop.io.compress.Lz4Codec |
snappy | org.apache.hadoop.io.compress.SnappyCodec |
getCodecClassName¶
getCodecClassName(
name: String): String
getCodecClassName
looks up a codec by name
in the known codecs and makes sure that it's available on the classpath.
getCodecClassName
is used when:
CSVOptions
is requested forcompressionCodec
JSONOptions
is requested forcompressionCodec
TextOptions
is requested forcompressionCodec
setCodecConfiguration¶
setCodecConfiguration(
conf: Configuration,
codec: String): Unit
setCodecConfiguration
sets mapreduce
compression-related configuration properties in the given Configuration
(Apache Hadoop) (based on whether codec
is defined or not).
codec | Configuration Property | Value |
---|---|---|
defined | mapreduce.output.fileoutputformat.compress | true |
defined | mapreduce.output.fileoutputformat.compress.type | BLOCK |
defined | mapreduce.output.fileoutputformat.compress.codec | codec |
defined | mapreduce.map.output.compress | true |
defined | mapreduce.map.output.compress.codec | codec |
undefined | mapreduce.output.fileoutputformat.compress | false |
undefined | mapreduce.map.output.compress | false |
setCodecConfiguration
is used when:
CSVFileFormat
is requested toprepareWrite
(based oncompression
orcodec
options)JsonFileFormat
is requested toprepareWrite
(based oncompression
option)TextFileFormat
is requested toprepareWrite
(based oncompression
option)CSVWrite
is requested toprepareWrite
(based oncompression
option)JsonWrite
is requested toprepareWrite
(based oncompression
option)TextWrite
is requested toprepareWrite
(based oncompression
option)