FileWriterFactory¶
FileWriterFactory
is a DataWriterFactory of FileBatchWrites.
Creating Instance¶
FileWriterFactory
takes the following to be created:
-
WriteJobDescription
-
FileCommitProtocol
(Spark Core)
FileWriterFactory
is created when:
FileBatchWrite
is requested for a DataWriterFactory
Creating DataWriter¶
createWriter(
partitionId: Int,
realTaskId: Long): DataWriter[InternalRow]
createWriter
creates a TaskAttemptContext.
createWriter
requests the FileCommitProtocol to setupTask
(with the TaskAttemptContext
).
For a non-partitioned write job (i.e., no partition columns in the WriteJobDescription), createWriter
creates a SingleDirectoryDataWriter. Otherwise, createWriter
creates a DynamicPartitionDataSingleWriter.
createWriter
is part of the DataWriterFactory abstraction.
Creating Hadoop TaskAttemptContext¶
createTaskAttemptContext(
partitionId: Int): TaskAttemptContextImpl
createTaskAttemptContext
creates a Hadoop JobID.
createTaskAttemptContext
creates a Hadoop TaskID (for the JobID
and the given partitionId
as TaskType.MAP
type).
createTaskAttemptContext
creates a Hadoop TaskAttemptID (for the TaskID
).
createTaskAttemptContext
uses the Hadoop Configuration (from the WriteJobDescription) to set the following properties:
Name | Value |
---|---|
mapreduce.job.id | the JobID |
mapreduce.task.id | the TaskID |
mapreduce.task.attempt.id | the TaskAttemptID |
mapreduce.task.ismap | true |
mapreduce.task.partition | 0 |
In the end, createTaskAttemptContext
creates a new Hadoop TaskAttemptContextImpl (with the Configuration
and the TaskAttemptID
).