TaskMetrics¶
TaskMetrics is a collection of metrics (accumulators) tracked during execution of a task.
Creating Instance¶
TaskMetrics takes no input arguments to be created.
TaskMetrics is created when:
Stageis requested to makeNewStageAttempt
Metrics¶
ShuffleWriteMetrics¶
- shuffle.write.bytesWritten
- shuffle.write.recordsWritten
- shuffle.write.writeTime
ShuffleWriteMetrics is exposed using Dropwizard metrics system using ExecutorSource (when TaskRunner is about to finish running):
- shuffleBytesWritten
- shuffleRecordsWritten
- shuffleWriteTime
ShuffleWriteMetrics can be monitored using:
- StatsReportListener (when a stage completes)
- shuffle bytes written
- JsonProtocol (when requested to taskMetricsToJson)
- Shuffle Bytes Written
- Shuffle Write Time
- Shuffle Records Written
shuffleWriteMetrics is used when:
ShuffleWriteProcessoris requested for a ShuffleWriteMetricsReporterSortShuffleWriteris createdAppStatusListeneris requested to handle a SparkListenerTaskEndLiveTaskis requested toupdateMetricsExternalSorteris requested to writePartitionedFile (to create a DiskBlockObjectWriter), writePartitionedMapOutputShuffleExchangeExec(Spark SQL) is requested for aShuffleWriteProcessor(to create a ShuffleDependency)
Memory Bytes Spilled¶
Number of in-memory bytes spilled by the tasks (of a stage)
_memoryBytesSpilled is a LongAccumulator with internal.metrics.memoryBytesSpilled name.
memoryBytesSpilled metric is exposed using ExecutorSource as memoryBytesSpilled (using Dropwizard metrics system).
memoryBytesSpilled¶
memoryBytesSpilled is the sum of all memory bytes spilled across all tasks.
memoryBytesSpilled is used when:
SpillListeneris requested to onStageCompletedTaskRunneris requested to run (and updates task metrics in the Dropwizard metrics system)LiveTaskis requested toupdateMetricsJsonProtocolis requested to taskMetricsToJson
incMemoryBytesSpilled¶
incMemoryBytesSpilled adds the v value to the _memoryBytesSpilled metric.
incMemoryBytesSpilled is used when:
Aggregatoris requested to updateMetricsBasePythonRunner.ReaderIteratoris requested tohandleTimingDataCoGroupedRDDis requested to compute a partitionShuffleExternalSorteris requested to spillJsonProtocolis requested to taskMetricsFromJsonExternalSorteris requested to insertAllAndUpdateMetrics, writePartitionedFile, writePartitionedMapOutputUnsafeExternalSorteris requested to createWithExistingInMemorySorter, spillUnsafeExternalSorter.SpillableIteratoris requested tospill
TaskContext¶
TaskMetrics is available using TaskContext.taskMetrics.
Serializable¶
TaskMetrics is a Serializable (Java).
Task¶
TaskMetrics is part of Task.
SparkListener¶
TaskMetrics is available using SparkListener and intercepting SparkListenerTaskEnd events.
StatsReportListener¶
StatsReportListener can be used for summary statistics at runtime (after a stage completes).
Spark History Server¶
Spark History Server uses EventLoggingListener to intercept post-execution statistics (incl. TaskMetrics).