SingleDirectoryDataWriter¶
SingleDirectoryDataWriter
is a FileFormatDataWriter for FileFormatWriter and FileWriterFactory.
Creating Instance¶
SingleDirectoryDataWriter
takes the following to be created:
-
WriteJobDescription
- Hadoop TaskAttemptContext
-
FileCommitProtocol
(Spark Core) - Custom SQLMetrics by name (
Map[String, SQLMetric]
)
While being created, SingleDirectoryDataWriter
creates a new OutputWriter.
SingleDirectoryDataWriter
is created when:
FileFormatWriter
is requested to write data out (in a single Spark task) (of a non-partitioned non-bucketed write job)FileWriterFactory
is requested for a DataWriter (of a non-partitioned write job)
recordsInFile Counter¶
SingleDirectoryDataWriter
uses recordsInFile
counter to track how many records have been written out.
recordsInFile
counter is 0
when SingleDirectoryDataWriter
creates a new OutputWriter (and increments until maxRecordsPerFile
threshold if defined).
Writing Record Out¶
FileFormatDataWriter
write(
record: InternalRow): Unit
write
is part of the FileFormatDataWriter abstraction.
write
creates a new OutputWriter for a positive maxRecordsPerFile
(of the WriteJobDescription) and the recordsInFile counter above the threshold.
write
requests the current OutputWriter to write the record and informs the WriteTaskStatsTrackers that there was a new row.
write
increments the recordsInFile.
Creating New OutputWriter¶
newOutputWriter(): Unit
newOutputWriter
sets the recordsInFile counter to 0
.
newOutputWriter
releaseResources.
newOutputWriter
uses the given WriteJobDescription to access the OutputWriterFactory
for a file extension (ext
).
newOutputWriter
requests the given FileCommitProtocol for a path of a new data file (with -c[fileCounter][nnn][ext]
suffix).
newOutputWriter
uses the given WriteJobDescription to access the OutputWriterFactory
for a new OutputWriter.
newOutputWriter
informs the WriteTaskStatsTrackers that a new file is about to be written.
newOutputWriter
is used when: