TaskMetrics¶
TaskMetrics
is a collection of metrics (accumulators) tracked during execution of a task.
Creating Instance¶
TaskMetrics
takes no input arguments to be created.
TaskMetrics
is created when:
Stage
is requested to makeNewStageAttempt
Metrics¶
ShuffleWriteMetrics¶
- shuffle.write.bytesWritten
- shuffle.write.recordsWritten
- shuffle.write.writeTime
ShuffleWriteMetrics
is exposed using Dropwizard metrics system using ExecutorSource (when TaskRunner
is about to finish running):
- shuffleBytesWritten
- shuffleRecordsWritten
- shuffleWriteTime
ShuffleWriteMetrics
can be monitored using:
- StatsReportListener (when a stage completes)
- shuffle bytes written
- JsonProtocol (when requested to taskMetricsToJson)
- Shuffle Bytes Written
- Shuffle Write Time
- Shuffle Records Written
shuffleWriteMetrics
is used when:
ShuffleWriteProcessor
is requested for a ShuffleWriteMetricsReporterSortShuffleWriter
is createdAppStatusListener
is requested to handle a SparkListenerTaskEndLiveTask
is requested toupdateMetrics
ExternalSorter
is requested to writePartitionedFile (to create a DiskBlockObjectWriter), writePartitionedMapOutputShuffleExchangeExec
(Spark SQL) is requested for aShuffleWriteProcessor
(to create a ShuffleDependency)
Memory Bytes Spilled¶
Number of in-memory bytes spilled by the tasks (of a stage)
_memoryBytesSpilled
is a LongAccumulator
with internal.metrics.memoryBytesSpilled
name.
memoryBytesSpilled
metric is exposed using ExecutorSource as memoryBytesSpilled (using Dropwizard metrics system).
memoryBytesSpilled¶
memoryBytesSpilled: Long
memoryBytesSpilled
is the sum of all memory bytes spilled across all tasks.
memoryBytesSpilled
is used when:
SpillListener
is requested to onStageCompletedTaskRunner
is requested to run (and updates task metrics in the Dropwizard metrics system)LiveTask
is requested toupdateMetrics
JsonProtocol
is requested to taskMetricsToJson
incMemoryBytesSpilled¶
incMemoryBytesSpilled(
v: Long): Unit
incMemoryBytesSpilled
adds the v
value to the _memoryBytesSpilled metric.
incMemoryBytesSpilled
is used when:
Aggregator
is requested to updateMetricsBasePythonRunner.ReaderIterator
is requested tohandleTimingData
CoGroupedRDD
is requested to compute a partitionShuffleExternalSorter
is requested to spillJsonProtocol
is requested to taskMetricsFromJsonExternalSorter
is requested to insertAllAndUpdateMetrics, writePartitionedFile, writePartitionedMapOutputUnsafeExternalSorter
is requested to createWithExistingInMemorySorter, spillUnsafeExternalSorter.SpillableIterator
is requested tospill
TaskContext¶
TaskMetrics
is available using TaskContext.taskMetrics.
TaskContext.get.taskMetrics
Serializable¶
TaskMetrics
is a Serializable
(Java).
Task¶
TaskMetrics
is part of Task.
task.metrics
SparkListener¶
TaskMetrics
is available using SparkListener and intercepting SparkListenerTaskEnd events.
StatsReportListener¶
StatsReportListener can be used for summary statistics at runtime (after a stage completes).
Spark History Server¶
Spark History Server uses EventLoggingListener to intercept post-execution statistics (incl. TaskMetrics
).