Skip to content

= SparkListener

SparkListener is a mechanism in Spark to intercept events from the Spark scheduler that are emitted over the course of execution of a Spark application.

SparkListener extends <> with all the callback methods being no-op/do-nothing.

Spark <SparkListeners internally>> to manage communication between internal components in the distributed environment for a Spark application, e.g.[web UI],[event persistence] (for History Server),[dynamic allocation of executors],[keeping track of executors (using HeartbeatReceiver)] and others.

You can develop your own custom SparkListener and register it using[SparkContext.addSparkListener] method or[spark.extraListeners] configuration property.

With SparkListener you can focus on Spark events of your liking and process a subset of all scheduling events.


Enable INFO logging level for org.apache.spark.SparkContext logger to see when custom Spark listeners are registered.

INFO SparkContext: Registered listener org.apache.spark.scheduler.StatsReportListener


== [[SparkListenerInterface]] SparkListenerInterface -- Internal Contract for Spark Listeners

SparkListenerInterface is an private[spark] contract for Spark listeners to intercept events from the Spark scheduler.

NOTE: <> and <> Spark listeners are direct implementations of SparkListenerInterface contract to help developing more sophisticated Spark listeners.

.SparkListenerInterface Methods [cols="1,1,2",options="header",width="100%"] |=== | Method | Event | Reason

| onApplicationEnd | [[SparkListenerApplicationEnd]] SparkListenerApplicationEnd | SparkContext does postApplicationEnd

| [[onApplicationStart]] onApplicationStart | [[SparkListenerApplicationStart]] SparkListenerApplicationStart | SparkContext does postApplicationStart

| [[onBlockManagerAdded]] onBlockManagerAdded | [[SparkListenerBlockManagerAdded]] SparkListenerBlockManagerAdded | BlockManagerMasterEndpoint[has registered a BlockManager].

| [[onBlockManagerRemoved]] onBlockManagerRemoved | [[SparkListenerBlockManagerRemoved]] SparkListenerBlockManagerRemoved | BlockManagerMasterEndpoint[has removed a BlockManager] (which is when...FIXME)

| [[onBlockUpdated]] onBlockUpdated | [[SparkListenerBlockUpdated]] SparkListenerBlockUpdated | BlockManagerMasterEndpoint receives a[UpdateBlockInfo] event (which is when BlockManager[reports a block status update to driver]).

| onEnvironmentUpdate | [[SparkListenerEnvironmentUpdate]] SparkListenerEnvironmentUpdate | SparkContext does postEnvironmentUpdate.

| onExecutorMetricsUpdate | [[SparkListenerExecutorMetricsUpdate]] SparkListenerExecutorMetricsUpdate |

| onExecutorAdded | [[SparkListenerExecutorAdded]] SparkListenerExecutorAdded | [[onExecutorAdded]] DriverEndpoint RPC endpoint (of CoarseGrainedSchedulerBackend)[receives RegisterExecutor message], MesosFineGrainedSchedulerBackend does resourceOffers, and LocalSchedulerBackendEndpoint starts.

| [[onExecutorBlacklisted]] onExecutorBlacklisted | [[SparkListenerExecutorBlacklisted]] SparkListenerExecutorBlacklisted | FIXME

| [[onExecutorRemoved]] onExecutorRemoved | [[SparkListenerExecutorRemoved]] SparkListenerExecutorRemoved | DriverEndpoint RPC endpoint (of CoarseGrainedSchedulerBackend) does[removeExecutor] and MesosFineGrainedSchedulerBackend does removeExecutor.

| [[onExecutorUnblacklisted]] onExecutorUnblacklisted | [[SparkListenerExecutorUnblacklisted]] SparkListenerExecutorUnblacklisted | FIXME

| onJobEnd | [[SparkListenerJobEnd]] SparkListenerJobEnd | DAGScheduler does cleanUpAfterSchedulerStop, handleTaskCompletion, failJobAndIndependentStages, and markMapStageJobAsFinished.

| [[onJobStart]] onJobStart | [[SparkListenerJobStart]] SparkListenerJobStart | DAGScheduler handles[JobSubmitted] and[MapStageSubmitted] messages

| [[onNodeBlacklisted]] onNodeBlacklisted | [[SparkListenerNodeBlacklisted]] SparkListenerNodeBlacklisted | FIXME

| [[onNodeUnblacklisted]] onNodeUnblacklisted | [[SparkListenerNodeUnblacklisted]] SparkListenerNodeUnblacklisted | FIXME

| [[onStageCompleted]] onStageCompleted | [[SparkListenerStageCompleted]] SparkListenerStageCompleted | DAGScheduler[marks a stage as finished].

| [[onStageSubmitted]] onStageSubmitted | [[SparkListenerStageSubmitted]] SparkListenerStageSubmitted | DAGScheduler[submits the missing tasks of a stage (in a Spark job)].

| [[onTaskEnd]] onTaskEnd | [[SparkListenerTaskEnd]] SparkListenerTaskEnd | DAGScheduler[handles a task completion]

| onTaskGettingResult | [[SparkListenerTaskGettingResult]] SparkListenerTaskGettingResult | DAGScheduler[handles GettingResultEvent event]

| [[onTaskStart]] onTaskStart | [[SparkListenerTaskStart]] SparkListenerTaskStart | DAGScheduler is informed that a[task is about to start].

| [[onUnpersistRDD]] onUnpersistRDD | [[SparkListenerUnpersistRDD]] SparkListenerUnpersistRDD | SparkContext[unpersists an RDD], i.e. removes RDD blocks from BlockManagerMaster (that can be triggered[explicitly] or[implicitly]).

| [[onOtherEvent]] onOtherEvent | [[SparkListenerEvent]] SparkListenerEvent | Catch-all callback that is often used in Spark SQL to handle custom events. |===

== [[builtin-implementations]] Built-In Spark Listeners

.Built-In Spark Listeners [cols="1,2",options="header",width="100%"] |=== | Spark Listener | Description |[EventLoggingListener] | Logs JSON-encoded events to a file that can later be read by[History Server] |[StatsReportListener] | | [[SparkFirehoseListener]] SparkFirehoseListener | Allows users to receive all <> events by overriding the single onEvent method only. |[ExecutorAllocationListener] | |[HeartbeatReceiver] | | spark-streaming/[StreamingJobProgressListener] | |[ExecutorsListener] | Prepares information for[Executors tab] in[web UI] |[StorageStatusListener],[RDDOperationGraphListener],[EnvironmentListener],[BlockStatusListener] and[StorageListener] | For[web UI] | SpillListener | | ApplicationEventListener | |[StreamingQueryListenerBus] | |[SQLListener] /[SQLHistoryListener] | Support for[History Server] | spark-streaming/[StreamingListenerBus] | |[JobProgressListener] | |===

Last update: 2020-10-06