SingleDirectoryDataWriter takes the following to be created:
- Hadoop TaskAttemptContext
- Custom SQLMetrics by name (
While being created,
SingleDirectoryDataWriter creates a new OutputWriter.
SingleDirectoryDataWriter is created when:
FileFormatWriteris requested to write data out (in a single Spark task) (of a non-partitioned non-bucketed write job)
FileWriterFactoryis requested for a DataWriter (of a non-partitioned write job)
recordsInFile counter to track how many records have been written out.
recordsInFile counter is
SingleDirectoryDataWriter creates a new OutputWriter (and increments until
maxRecordsPerFile threshold if defined).
Writing Record Out¶
write( record: InternalRow): Unit
write increments the recordsInFile.
write is part of the FileFormatDataWriter abstraction.
Creating New OutputWriter¶
newOutputWriter sets the recordsInFile counter to
newOutputWriter uses the given WriteJobDescription to access the
OutputWriterFactory for a file extension (
newOutputWriter requests the given FileCommitProtocol for a path of a new data file (with
newOutputWriter is used when: