SingleDirectoryDataWriter¶
SingleDirectoryDataWriter is a FileFormatDataWriter for FileFormatWriter and FileWriterFactory.
Creating Instance¶
SingleDirectoryDataWriter takes the following to be created:
-
WriteJobDescription - Hadoop TaskAttemptContext
-
FileCommitProtocol(Spark Core) - Custom SQLMetrics by name (
Map[String, SQLMetric])
While being created, SingleDirectoryDataWriter creates a new OutputWriter.
SingleDirectoryDataWriter is created when:
FileFormatWriteris requested to write data out (in a single Spark task) (of a non-partitioned non-bucketed write job)FileWriterFactoryis requested for a DataWriter (of a non-partitioned write job)
recordsInFile Counter¶
SingleDirectoryDataWriter uses recordsInFile counter to track how many records have been written out.
recordsInFile counter is 0 when SingleDirectoryDataWriter creates a new OutputWriter (and increments until maxRecordsPerFile threshold if defined).
Writing Record Out¶
FileFormatDataWriter
write(
record: InternalRow): Unit
write is part of the FileFormatDataWriter abstraction.
write creates a new OutputWriter for a positive maxRecordsPerFile (of the WriteJobDescription) and the recordsInFile counter above the threshold.
write requests the current OutputWriter to write the record and informs the WriteTaskStatsTrackers that there was a new row.
write increments the recordsInFile.
Creating New OutputWriter¶
newOutputWriter(): Unit
newOutputWriter sets the recordsInFile counter to 0.
newOutputWriter releaseResources.
newOutputWriter uses the given WriteJobDescription to access the OutputWriterFactory for a file extension (ext).
newOutputWriter requests the given FileCommitProtocol for a path of a new data file (with -c[fileCounter][nnn][ext] suffix).
newOutputWriter uses the given WriteJobDescription to access the OutputWriterFactory for a new OutputWriter.
newOutputWriter informs the WriteTaskStatsTrackers that a new file is about to be written.
newOutputWriter is used when: