Skip to content

FileWrite

FileWrite is an extension of the Write abstraction for file writers.

Contract

Format Name

formatName: String

Used when:

LogicalWriteInfo

info: LogicalWriteInfo

Used when:

paths

paths: Seq[String]

Used when:

Preparing Write Job

prepareWrite(
  sqlConf: SQLConf,
  job: Job,
  options: Map[String, String],
  dataSchema: StructType): OutputWriterFactory

Prepares a write job and returns an OutputWriterFactory

Used when:

supportsDataType

supportsDataType: DataType => Boolean

Used when:

Implementations

  • AvroWrite
  • CSVWrite
  • JsonWrite
  • OrcWrite
  • ParquetWrite
  • TextWrite

Creating BatchWrite

toBatch: BatchWrite

toBatch validateInputs.

toBatch creates a new Hadoop Job for just a single path out of the paths.

toBatch creates a FileCommitProtocol (Spark Core) with the following:

  1. spark.sql.sources.commitProtocolClass
  2. A random job ID
  3. The first of the paths

toBatch creates a WriteJobDescription.

toBatch requests the FileCommitProtocol to setupJob (with the Hadoop Job instance).

In the end, toBatch creates a FileBatchWrite (for the Hadoop Job, the WriteJobDescription and the FileCommitProtocol).


toBatch is part of the Write abstraction.

Creating WriteJobDescription

createWriteJobDescription(
  sparkSession: SparkSession,
  hadoopConf: Configuration,
  job: Job,
  pathName: String,
  options: Map[String, String]): WriteJobDescription

createWriteJobDescription...FIXME