FileFormatDataWriter¶
FileFormatDataWriter is an extension of the DataWriter abstraction for data writers (of InternalRows).
Contract¶
Writing Record Out¶
write(
record: InternalRow): Unit
See:
Note
write is a concrete type variant of DataWriter (with T as InternalRow)
Implementations¶
- BaseDynamicPartitionDataWriter
EmptyDirectoryDataWriter- SingleDirectoryDataWriter
Creating Instance¶
FileFormatDataWriter takes the following to be created:
-
WriteJobDescription -
TaskAttemptContext(Apache Hadoop) -
FileCommitProtocol(Spark Core) - Custom SQLMetrics by name (
Map[String, SQLMetric])
Abstract Class
FileFormatDataWriter is an abstract class and cannot be created directly. It is created indirectly for the concrete FileFormatDataWriters.
writeWithMetrics¶
writeWithMetrics(
record: InternalRow,
count: Long): Unit
writeWithMetrics updates the CustomTaskMetrics with the customMetrics and writes out the given InternalRow.
writeWithMetrics is used when:
FileFormatDataWriteris requested to write out (a collection of) records
Writing Out (Collection of) Records¶
writeWithIterator(
iterator: Iterator[InternalRow]): Unit
writeWithIterator...FIXME
writeWithIterator is used when:
FileFormatWriterutility is used to write data out in a single Spark task
Committing Successful Write¶
commit releaseResources.
commit requests the FileCommitProtocol to commitTask (that gives a TaskCommitMessage).
commit creates a new ExecutedWriteSummary with the updatedPartitions and the WriteTaskStats of the WriteTaskStatsTrackers.
In the end, commit creates a WriteTaskResult (for the TaskCommitMessage and the ExecutedWriteSummary).