CSVFileFormat¶
[[shortName]] CSVFileFormat
is a TextBasedFileFormat for csv format (i.e. registers itself to handle files in csv format and converts them to Spark SQL rows).
spark.read.format("csv").load("csv-datasets")
// or the same as above using a shortcut
spark.read.csv("csv-datasets")
CSVFileFormat
uses <
[[options]] [[CSVOptions]] .CSVFileFormat's Options [cols="1,1,3",options="header",width="100%"] |=== | Option | Default Value | Description
| [[charset]] charset
| UTF-8
|
Alias of <
| [[charToEscapeQuoteEscaping]] charToEscapeQuoteEscaping
| \\
| One character to...FIXME
| [[codec]] codec
| a| Compression codec that can be either one of the known aliases or a fully-qualified class name.
Alias of <
[[columnNameOfCorruptRecord]] columnNameOfCorruptRecord |
---|
| [[comment]] comment
| \u0000
|
| [[compression]] compression
| a| Compression codec that can be either one of the known aliases or a fully-qualified class name.
Alias of <
| [[dateFormat]] dateFormat
| yyyy-MM-dd
| Uses en_US
locale
| [[delimiter]] delimiter
| ,
(comma) |
Alias of <
| [[encoding]] encoding
| UTF-8
|
Alias of <
| [[escape]] escape
| \\
|
| [[escapeQuotes]] escapeQuotes
| true
|
[[header_]] header |
---|
| [[ignoreLeadingWhiteSpace]] ignoreLeadingWhiteSpace
a| * false
(for reading) * true
(for writing) |
| [[ignoreTrailingWhiteSpace]] ignoreTrailingWhiteSpace
a| * false
(for reading) * true
(for writing) |
[[inferSchema]] inferSchema |
---|
| [[maxCharsPerColumn]] maxCharsPerColumn
| -1
|
| [[maxColumns]] maxColumns
| 20480
|
| [[mode]] mode
| PERMISSIVE
a|
Possible values:
DROPMALFORMED
PERMISSIVE
(default)FAILFAST
| [[multiLine]] multiLine
| false
|
| [[nanValue]] nanValue
| NaN
|
| [[negativeInf]] negativeInf
| -Inf
|
| [[nullValue]] nullValue
| (empty string) |
| [[positiveInf]] positiveInf
| Inf
|
| [[sep]] sep
| ,
(comma) |
Alias of <
| [[timestampFormat]] timestampFormat
| yyyy-MM-dd'T'HH:mm:ss.SSSXXX
| Uses <en_US
locale
| [[timeZone]] timeZone
| spark.sql.session.timeZone |
| [[quote]] quote
| \"
|
| [[quoteAll]] quoteAll
| false
| |===
=== [[prepareWrite]] Preparing Write Job -- prepareWrite
Method
[source, scala]¶
prepareWrite( sparkSession: SparkSession, job: Job, options: Map[String, String], dataSchema: StructType): OutputWriterFactory
prepareWrite
...FIXME
prepareWrite
is part of the FileFormat abstraction.
=== [[buildReader]] Building Partitioned Data Reader -- buildReader
Method
[source, scala]¶
buildReader( sparkSession: SparkSession, dataSchema: StructType, partitionSchema: StructType, requiredSchema: StructType, filters: Seq[Filter], options: Map[String, String], hadoopConf: Configuration): (PartitionedFile) => Iterator[InternalRow]
buildReader
...FIXME
buildReader
is part of the FileFormat abstraction.