FileWriterFactory¶
FileWriterFactory is a DataWriterFactory of FileBatchWrites.
Creating Instance¶
FileWriterFactory takes the following to be created:
-
WriteJobDescription -
FileCommitProtocol(Spark Core)
FileWriterFactory is created when:
FileBatchWriteis requested for a DataWriterFactory
Creating DataWriter¶
createWriter(
partitionId: Int,
realTaskId: Long): DataWriter[InternalRow]
createWriter creates a TaskAttemptContext.
createWriter requests the FileCommitProtocol to setupTask (with the TaskAttemptContext).
For a non-partitioned write job (i.e., no partition columns in the WriteJobDescription), createWriter creates a SingleDirectoryDataWriter. Otherwise, createWriter creates a DynamicPartitionDataSingleWriter.
createWriter is part of the DataWriterFactory abstraction.
Creating Hadoop TaskAttemptContext¶
createTaskAttemptContext(
partitionId: Int): TaskAttemptContextImpl
createTaskAttemptContext creates a Hadoop JobID.
createTaskAttemptContext creates a Hadoop TaskID (for the JobID and the given partitionId as TaskType.MAP type).
createTaskAttemptContext creates a Hadoop TaskAttemptID (for the TaskID).
createTaskAttemptContext uses the Hadoop Configuration (from the WriteJobDescription) to set the following properties:
| Name | Value |
|---|---|
| mapreduce.job.id | the JobID |
| mapreduce.task.id | the TaskID |
| mapreduce.task.attempt.id | the TaskAttemptID |
| mapreduce.task.ismap | true |
| mapreduce.task.partition | 0 |
In the end, createTaskAttemptContext creates a new Hadoop TaskAttemptContextImpl (with the Configuration and the TaskAttemptID).