FileWrite¶
FileWrite
is an extension of the Write abstraction for file writers.
Contract¶
Format Name¶
formatName: String
See:
Used when:
FileWrite
is requested for the description and validateInputs
LogicalWriteInfo¶
info: LogicalWriteInfo
See:
Used when:
paths¶
paths: Seq[String]
See:
Used when:
FileWrite
is requested for a BatchWrite and to validateInputs
Preparing Write Job¶
prepareWrite(
sqlConf: SQLConf,
job: Job,
options: Map[String, String],
dataSchema: StructType): OutputWriterFactory
Prepares a write job and returns an OutputWriterFactory
See:
Used when:
FileWrite
is requested for a BatchWrite (and creates a WriteJobDescription)
supportsDataType¶
supportsDataType: DataType => Boolean
See:
Used when:
FileWrite
is requested to validateInputs
Implementations¶
AvroWrite
CSVWrite
JsonWrite
OrcWrite
- ParquetWrite
TextWrite
Creating BatchWrite¶
toBatch
validateInputs.
toBatch
creates a new Hadoop Job for just a single path out of the paths.
toBatch
creates a FileCommitProtocol
(Spark Core) with the following:
- spark.sql.sources.commitProtocolClass
- A random job ID
- The first of the paths
toBatch
creates a WriteJobDescription.
toBatch
requests the FileCommitProtocol
to setupJob
(with the Hadoop Job
instance).
In the end, toBatch
creates a FileBatchWrite (for the Hadoop Job
, the WriteJobDescription
and the FileCommitProtocol
).
Creating WriteJobDescription¶
createWriteJobDescription(
sparkSession: SparkSession,
hadoopConf: Configuration,
job: Job,
pathName: String,
options: Map[String, String]): WriteJobDescription
createWriteJobDescription
...FIXME