FileFormatDataWriter¶
FileFormatDataWriter
is an extension of the DataWriter abstraction for data writers (of InternalRows).
Contract¶
Writing Record Out¶
write(
record: InternalRow): Unit
See:
Note
write
is a concrete type variant of DataWriter (with T
as InternalRow)
Implementations¶
- BaseDynamicPartitionDataWriter
EmptyDirectoryDataWriter
- SingleDirectoryDataWriter
Creating Instance¶
FileFormatDataWriter
takes the following to be created:
-
WriteJobDescription
-
TaskAttemptContext
(Apache Hadoop) -
FileCommitProtocol
(Spark Core) - Custom SQLMetrics by name (
Map[String, SQLMetric]
)
Abstract Class
FileFormatDataWriter
is an abstract class and cannot be created directly. It is created indirectly for the concrete FileFormatDataWriters.
writeWithMetrics¶
writeWithMetrics(
record: InternalRow,
count: Long): Unit
writeWithMetrics
updates the CustomTaskMetrics with the customMetrics and writes out the given InternalRow.
writeWithMetrics
is used when:
FileFormatDataWriter
is requested to write out (a collection of) records
Writing Out (Collection of) Records¶
writeWithIterator(
iterator: Iterator[InternalRow]): Unit
writeWithIterator
...FIXME
writeWithIterator
is used when:
FileFormatWriter
utility is used to write data out in a single Spark task
Committing Successful Write¶
commit
releaseResources.
commit
requests the FileCommitProtocol to commitTask
(that gives a TaskCommitMessage
).
commit
creates a new ExecutedWriteSummary
with the updatedPartitions and the WriteTaskStats of the WriteTaskStatsTrackers.
In the end, commit
creates a WriteTaskResult
(for the TaskCommitMessage
and the ExecutedWriteSummary
).