EventLoggingListener

EventLoggingListener is a SparkListener that writes out JSON-encoded events of a Spark application with event logging enabled (based on spark.eventLog.enabled configuration property).

EventLoggingListener supports custom configuration properties.

EventLoggingListener writes out log files to a directory (based on spark.eventLog.dir configuration property). All Spark events are logged (but SparkListenerBlockUpdated and SparkListenerExecutorMetricsUpdate).

Use Spark History Server to view the event logs in a browser (similarly to web UI of a Spark application).

EventLoggingListener uses .inprogress file extension for in-flight event log files of active Spark applications.

EventLoggingListener can compress events (based on spark.eventLog.compress configuration property).

Creating Instance

EventLoggingListener takes the following to be created:

EventLoggingListener initializes the internal properties.

When initialized with no Hadoop Configuration, EventLoggingListener uses SparkHadoopUtil utility to create a new one.

Event Log File

logPath: String

logPath is…​FIXME

logPath is used when EventLoggingListener is requested to start and stop.

Starting EventLoggingListener

start(): Unit

start deletes the logPath with the .inprogress extension.

The log file’s working name is created based on appId with or without the compression codec used and appAttemptId, i.e. local-1461696754069. It also uses .inprogress extension.

If overwrite is enabled, you should see the WARN message:

Event log [path] already exists. Overwriting...

The working log .inprogress is attempted to be deleted. In case it could not be deleted, the following WARN message is printed out to the logs:

Error deleting [path]

The buffered output stream is created with metadata with Spark’s version and SparkListenerLogStart class' name as the first line.

{"Event":"SparkListenerLogStart","Spark Version":"2.0.0-SNAPSHOT"}

At this point, EventLoggingListener is ready for event logging and you should see the following INFO message in the logs:

Logging events to [logPath]

start throws an IllegalArgumentException when the logBaseDir is not a directory:

Log directory [logBaseDir] is not a directory.
start is executed while SparkContext is created.

Logging Event (In JSON Format)

logEvent(
  event: SparkListenerEvent,
  flushLogger: Boolean = false): Unit

logEvent logs the given event as JSON.

FIXME

Stopping EventLoggingListener

stop(): Unit

stop closes PrintWriter for the log file and renames the file to be without .inprogress extension.

If the target log file exists (one without .inprogress extension), it overwrites the file if spark.eventLog.overwrite is enabled. You should see the following WARN message in the logs:

Event log [target] already exists. Overwriting...

If the target log file exists and overwrite is disabled, an java.io.IOException is thrown with the following message:

Target log file already exists ([logPath])
stop is executed while SparkContext is requested to stop.

getLogPath Utility

getLogPath(
  logBaseDir: URI,
  appId: String,
  appAttemptId: Option[String],
  compressionCodecName: Option[String] = None): String

getLogPath…​FIXME

getLogPath is used when EventLoggingListener is created (for the logPath).

openEventLog Utility

openEventLog(
  log: Path,
  fs: FileSystem): InputStream

openEventLog…​FIXME

openEventLog is used when…​FIXME

Logging

Enable ALL logging level for org.apache.spark.scheduler.EventLoggingListener logger to see what happens inside.

Add the following line to conf/log4j.properties:

log4j.logger.org.apache.spark.scheduler.EventLoggingListener=ALL

Refer to Logging.

Internal Properties

FSDataOutputStream

Hadoop FSDataOutputStream (default: None)

Used when…​FIXME

PrintWriter

Initialized when EventLoggingListener is requested to start

Used when EventLoggingListener is requested to logEvent

Closed when EventLoggingListener is requested to stop