BasicWriteTaskStatsTracker¶
BasicWriteTaskStatsTracker
is a WriteTaskStatsTracker.
Creating Instance¶
BasicWriteTaskStatsTracker
takes the following to be created:
- Hadoop Configuration
- Task Commit Time SQLMetric
BasicWriteTaskStatsTracker
is created when:
BasicWriteJobStatsTracker
is requested for a new WriteTaskStatsTracker instance
submittedFiles¶
BasicWriteTaskStatsTracker
uses submittedFiles
registry of the file paths added (written out to).
The submittedFiles
registry is used to updateFileStats when getFinalStats.
A file path is removed when BasicWriteTaskStatsTracker
is requested to closeFile.
All the file paths are removed in getFinalStats.
updateFileStats¶
updateFileStats(
filePath: String): Unit
updateFileStats
gets the size of the given filePath
.
If the file length is found, it is added to the numBytes registry with the numFiles incremented.
Processing New File Notification¶
newFile(
filePath: String): Unit
newFile
adds the given filePath
to the submittedFiles registry and increments the numSubmittedFiles counter.
newFile
is part of the WriteTaskStatsTracker abstraction.
Final WriteTaskStats¶
getFinalStats(
taskCommitTime: Long): WriteTaskStats
getFinalStats
updateFileStats for every submittedFiles that are then cleared up.
getFinalStats
sets the output metrics (of the current Spark task) as follows:
getFinalStats
prints out the following INFO message when the numSubmittedFiles is different from the numFiles:
Expected [numSubmittedFiles] files, but only saw $numFiles.
This could be due to the output format not writing empty files,
or files being not immediately visible in the filesystem.
getFinalStats
adds the given taskCommitTime
to the taskCommitTimeMetric if defined.
In the end, creates a new BasicWriteTaskStats with the partitions, numFiles, numBytes, and numRows.
getFinalStats
is part of the WriteTaskStatsTracker abstraction.
Logging¶
Enable ALL
logging level for org.apache.spark.sql.execution.datasources.BasicWriteTaskStatsTracker
logger to see what happens inside.
Add the following line to conf/log4j2.properties
:
log4j.logger.org.apache.spark.sql.execution.datasources.BasicWriteTaskStatsTracker=ALL
Refer to Logging.