Skip to content

FileWriterFactory

FileWriterFactory is a DataWriterFactory of FileBatchWrites.

Creating Instance

FileWriterFactory takes the following to be created:

  • WriteJobDescription
  • FileCommitProtocol (Spark Core)

FileWriterFactory is created when:

Creating DataWriter

createWriter(
  partitionId: Int,
  realTaskId: Long): DataWriter[InternalRow]

createWriter creates a TaskAttemptContext.

createWriter requests the FileCommitProtocol to setupTask (with the TaskAttemptContext).

For a non-partitioned write job (i.e., no partition columns in the WriteJobDescription), createWriter creates a SingleDirectoryDataWriter. Otherwise, createWriter creates a DynamicPartitionDataSingleWriter.


createWriter is part of the DataWriterFactory abstraction.

Creating Hadoop TaskAttemptContext

createTaskAttemptContext(
  partitionId: Int): TaskAttemptContextImpl

createTaskAttemptContext creates a Hadoop JobID.

createTaskAttemptContext creates a Hadoop TaskID (for the JobID and the given partitionId as TaskType.MAP type).

createTaskAttemptContext creates a Hadoop TaskAttemptID (for the TaskID).

createTaskAttemptContext uses the Hadoop Configuration (from the WriteJobDescription) to set the following properties:

Name Value
mapreduce.job.id the JobID
mapreduce.task.id the TaskID
mapreduce.task.attempt.id the TaskAttemptID
mapreduce.task.ismap true
mapreduce.task.partition 0

In the end, createTaskAttemptContext creates a new Hadoop TaskAttemptContextImpl (with the Configuration and the TaskAttemptID).