Skip to content

EventLoggingListener

= EventLoggingListener

EventLoggingListener is a ROOT:SparkListener.md[] that <> of a Spark application with event logging enabled (based on ROOT:configuration-properties.md#spark.eventLog.enabled[spark.eventLog.enabled] configuration property).

EventLoggingListener supports custom spark-history-server:configuration-properties.md#EventLoggingListener[configuration properties].

EventLoggingListener writes out log files to a directory (based on ROOT:configuration-properties.md#spark.eventLog.dir[spark.eventLog.dir] configuration property). All ROOT:SparkListener.md[]s are logged (but ROOT:SparkListener.md#SparkListenerBlockUpdated[SparkListenerBlockUpdated] and ROOT:SparkListener.md#SparkListenerExecutorMetricsUpdate[SparkListenerExecutorMetricsUpdate]).

TIP: Use index.md[Spark History Server] to view the event logs in a browser (similarly to webui:index.md[web UI] of a Spark application).

[[inprogress-extension]][[IN_PROGRESS]] EventLoggingListener uses .inprogress file extension for in-flight event log files of active Spark applications.

EventLoggingListener can compress events (based on ROOT:configuration-properties.md#spark.eventLog.compress[spark.eventLog.compress] configuration property).

== [[creating-instance]] Creating Instance

EventLoggingListener takes the following to be created:

EventLoggingListener initializes the <>.

NOTE: When initialized with no <>, EventLoggingListener uses SparkHadoopUtil utility to ROOT:spark-SparkHadoopUtil.md#newConfiguration[create a new one].

== [[logPath]] Event Log File

[source, scala]

logPath: String

logPath is...FIXME

NOTE: logPath is used when EventLoggingListener is requested to <> and <>.

== [[start]] Starting EventLoggingListener

[source, scala]

start(): Unit

start deletes the <> with the <> extension.

The log file's working name is created based on appId with or without the compression codec used and appAttemptId, i.e. local-1461696754069. It also uses .inprogress extension.

If <>, you should see the WARN message:

Event log [path] already exists. Overwriting...

The working log .inprogress is attempted to be deleted. In case it could not be deleted, the following WARN message is printed out to the logs:

Error deleting [path]

The buffered output stream is created with metadata with Spark's version and SparkListenerLogStart class' name as the first line.

{"Event":"SparkListenerLogStart","Spark Version":"2.0.0-SNAPSHOT"}

At this point, EventLoggingListener is ready for event logging and you should see the following INFO message in the logs:

Logging events to [logPath]

start throws an IllegalArgumentException when the <> is not a directory:

Log directory [logBaseDir] is not a directory.

start is used when SparkContext is created.

== [[logEvent]] Logging Event (In JSON Format)

[source, scala]

logEvent( event: SparkListenerEvent, flushLogger: Boolean = false): Unit


logEvent logs the given event as JSON.

CAUTION: FIXME

== [[stop]] Stopping EventLoggingListener

[source, scala]

stop(): Unit

stop closes PrintWriter for the log file and renames the file to be without .inprogress extension.

If the target log file exists (one without .inprogress extension), it overwrites the file if <> is enabled. You should see the following WARN message in the logs:

Event log [target] already exists. Overwriting...

If the target log file exists and overwrite is disabled, an java.io.IOException is thrown with the following message:

Target log file already exists ([logPath])

NOTE: stop is executed while SparkContext is requested to stop.

== [[getLogPath]] getLogPath Utility

[source, scala]

getLogPath( logBaseDir: URI, appId: String, appAttemptId: Option[String], compressionCodecName: Option[String] = None): String


getLogPath...FIXME

NOTE: getLogPath is used when EventLoggingListener is <> (for the <>).

== [[openEventLog]] openEventLog Utility

[source, scala]

openEventLog( log: Path, fs: FileSystem): InputStream


openEventLog...FIXME

openEventLog is used when...FIXME

== [[logging]] Logging

Enable ALL logging level for org.apache.spark.scheduler.EventLoggingListener logger to see what happens inside.

Add the following line to conf/log4j.properties:

[source,plaintext]

log4j.logger.org.apache.spark.scheduler.EventLoggingListener=ALL

Refer to ROOT:spark-logging.md[Logging].

== [[internal-properties]] Internal Properties

=== [[hadoopDataStream]] FSDataOutputStream

Hadoop http://hadoop.apache.org/docs/r2.7.3/api/org/apache/hadoop/fs/FSDataOutputStream.html[FSDataOutputStream] (default: None)

Used when...FIXME

=== [[writer]] PrintWriter

{java-javadoc-url}/java/io/PrintWriter.html[java.io.PrintWriter] for <> to the <>.

Initialized when EventLoggingListener is requested to <>

Used when EventLoggingListener is requested to <>

Closed when EventLoggingListener is requested to <>


Last update: 2020-10-08