Skip to content

SingleDirectoryDataWriter

SingleDirectoryDataWriter is a FileFormatDataWriter for FileFormatWriter and FileWriterFactory.

Creating Instance

SingleDirectoryDataWriter takes the following to be created:

While being created, SingleDirectoryDataWriter creates a new OutputWriter.

SingleDirectoryDataWriter is created when:

recordsInFile Counter

SingleDirectoryDataWriter uses recordsInFile counter to track how many records have been written out.

recordsInFile counter is 0 when SingleDirectoryDataWriter creates a new OutputWriter (and increments until maxRecordsPerFile threshold if defined).

Writing Record Out

FileFormatDataWriter
write(
  record: InternalRow): Unit

write is part of the FileFormatDataWriter abstraction.

write creates a new OutputWriter for a positive maxRecordsPerFile (of the WriteJobDescription) and the recordsInFile counter above the threshold.

write requests the current OutputWriter to write the record and informs the WriteTaskStatsTrackers that there was a new row.

write increments the recordsInFile.

Creating New OutputWriter

newOutputWriter(): Unit

newOutputWriter sets the recordsInFile counter to 0.

newOutputWriter releaseResources.

newOutputWriter uses the given WriteJobDescription to access the OutputWriterFactory for a file extension (ext).

newOutputWriter requests the given FileCommitProtocol for a path of a new data file (with -c[fileCounter][nnn][ext] suffix).

newOutputWriter uses the given WriteJobDescription to access the OutputWriterFactory for a new OutputWriter.

newOutputWriter informs the WriteTaskStatsTrackers that a new file is about to be written.


newOutputWriter is used when:

  • SingleDirectoryDataWriter is created and requested to write (every maxRecordsPerFile threshold)