TaskMetrics¶
TaskMetrics is a collection of metrics (accumulators) tracked during execution of a task.
Creating Instance¶
TaskMetrics takes no input arguments to be created.
TaskMetrics is created when:
Stageis requested to makeNewStageAttempt
Metrics¶
ShuffleWriteMetrics¶
- shuffle.write.bytesWritten
- shuffle.write.recordsWritten
- shuffle.write.writeTime
ShuffleWriteMetrics is exposed using Dropwizard metrics system using ExecutorSource (when TaskRunner is about to finish running):
- shuffleBytesWritten
- shuffleRecordsWritten
- shuffleWriteTime
ShuffleWriteMetrics can be monitored using:
- StatsReportListener (when a stage completes)
- shuffle bytes written
- JsonProtocol (when requested to taskMetricsToJson)
- Shuffle Bytes Written
- Shuffle Write Time
- Shuffle Records Written
shuffleWriteMetrics is used when:
ShuffleWriteProcessoris requested for a ShuffleWriteMetricsReporterSortShuffleWriteris createdAppStatusListeneris requested to handle a SparkListenerTaskEndLiveTaskis requested toupdateMetricsExternalSorteris requested to writePartitionedFile (to create a DiskBlockObjectWriter), writePartitionedMapOutputShuffleExchangeExec(Spark SQL) is requested for aShuffleWriteProcessor(to create a ShuffleDependency)
Memory Bytes Spilled¶
Number of in-memory bytes spilled by the tasks (of a stage)
_memoryBytesSpilled is a LongAccumulator with internal.metrics.memoryBytesSpilled name.
memoryBytesSpilled metric is exposed using ExecutorSource as memoryBytesSpilled (using Dropwizard metrics system).
memoryBytesSpilled¶
memoryBytesSpilled: Long
memoryBytesSpilled is the sum of all memory bytes spilled across all tasks.
memoryBytesSpilled is used when:
SpillListeneris requested to onStageCompletedTaskRunneris requested to run (and updates task metrics in the Dropwizard metrics system)LiveTaskis requested toupdateMetricsJsonProtocolis requested to taskMetricsToJson
incMemoryBytesSpilled¶
incMemoryBytesSpilled(
v: Long): Unit
incMemoryBytesSpilled adds the v value to the _memoryBytesSpilled metric.
incMemoryBytesSpilled is used when:
Aggregatoris requested to updateMetricsBasePythonRunner.ReaderIteratoris requested tohandleTimingDataCoGroupedRDDis requested to compute a partitionShuffleExternalSorteris requested to spillJsonProtocolis requested to taskMetricsFromJsonExternalSorteris requested to insertAllAndUpdateMetrics, writePartitionedFile, writePartitionedMapOutputUnsafeExternalSorteris requested to createWithExistingInMemorySorter, spillUnsafeExternalSorter.SpillableIteratoris requested tospill
TaskContext¶
TaskMetrics is available using TaskContext.taskMetrics.
TaskContext.get.taskMetrics
Serializable¶
TaskMetrics is a Serializable (Java).
Task¶
TaskMetrics is part of Task.
task.metrics
SparkListener¶
TaskMetrics is available using SparkListener and intercepting SparkListenerTaskEnd events.
StatsReportListener¶
StatsReportListener can be used for summary statistics at runtime (after a stage completes).
Spark History Server¶
Spark History Server uses EventLoggingListener to intercept post-execution statistics (incl. TaskMetrics).