BasicWriteTaskStatsTracker is a WriteTaskStatsTracker.
BasicWriteTaskStatsTracker takes the following to be created:
- Hadoop Configuration
- Task Commit Time SQLMetric
BasicWriteTaskStatsTracker is created when:
BasicWriteJobStatsTrackeris requested for a new WriteTaskStatsTracker instance
submittedFiles registry of the file paths added (written out to).
submittedFiles registry is used to updateFileStats when getFinalStats.
A file path is removed when
BasicWriteTaskStatsTracker is requested to closeFile.
All the file paths are removed in getFinalStats.
updateFileStats( filePath: String): Unit
updateFileStats gets the size of the given
If the file length is found, it is added to the numBytes registry with the numFiles incremented.
Processing New File Notification¶
newFile( filePath: String): Unit
newFile adds the given
filePath to the submittedFiles registry and increments the numSubmittedFiles counter.
newFile is part of the WriteTaskStatsTracker abstraction.
getFinalStats( taskCommitTime: Long): WriteTaskStats
getFinalStats updateFileStats for every submittedFiles that are then cleared up.
getFinalStats sets the output metrics (of the current Spark task) as follows:
getFinalStats prints out the following INFO message when the numSubmittedFiles is different from the numFiles:
Expected [numSubmittedFiles] files, but only saw $numFiles. This could be due to the output format not writing empty files, or files being not immediately visible in the filesystem.
getFinalStats adds the given
taskCommitTime to the taskCommitTimeMetric if defined.
In the end, creates a new BasicWriteTaskStats with the partitions, numFiles, numBytes, and numRows.
getFinalStats is part of the WriteTaskStatsTracker abstraction.
ALL logging level for
org.apache.spark.sql.execution.datasources.BasicWriteTaskStatsTracker logger to see what happens inside.
Add the following line to
Refer to Logging.