FileWrite¶
FileWrite is an extension of the Write abstraction for file writers.
Contract¶
Format Name¶
formatName: String
See:
Used when:
FileWriteis requested for the description and validateInputs
LogicalWriteInfo¶
info: LogicalWriteInfo
See:
Used when:
paths¶
paths: Seq[String]
See:
Used when:
FileWriteis requested for a BatchWrite and to validateInputs
Preparing Write Job¶
prepareWrite(
sqlConf: SQLConf,
job: Job,
options: Map[String, String],
dataSchema: StructType): OutputWriterFactory
Prepares a write job and returns an OutputWriterFactory
See:
Used when:
FileWriteis requested for a BatchWrite (and creates a WriteJobDescription)
supportsDataType¶
supportsDataType: DataType => Boolean
See:
Used when:
FileWriteis requested to validateInputs
Implementations¶
AvroWriteCSVWriteJsonWriteOrcWrite- ParquetWrite
TextWrite
Creating BatchWrite¶
toBatch validateInputs.
toBatch creates a new Hadoop Job for just a single path out of the paths.
toBatch creates a FileCommitProtocol (Spark Core) with the following:
- spark.sql.sources.commitProtocolClass
- A random job ID
- The first of the paths
toBatch creates a WriteJobDescription.
toBatch requests the FileCommitProtocol to setupJob (with the Hadoop Job instance).
In the end, toBatch creates a FileBatchWrite (for the Hadoop Job, the WriteJobDescription and the FileCommitProtocol).
Creating WriteJobDescription¶
createWriteJobDescription(
sparkSession: SparkSession,
hadoopConf: Configuration,
job: Job,
pathName: String,
options: Map[String, String]): WriteJobDescription
createWriteJobDescription...FIXME