Skip to content

Observation

Observation is used to simplify observing named metrics in batch queries using Dataset.observe.

val observation = Observation("name")
val observed = ds.observe(observation, max($"id").as("max_id"))
observed.count()
val metrics = observation.get
// Observe row count (rows) and highest id (maxid) in the Dataset while writing it
val observation = Observation("my_metrics")
val observed_ds = ds.observe(observation, count(lit(1)).as("rows"), max($"id").as("maxid"))
observed_ds.write.parquet("ds.parquet")
val metrics = observation.get

[SPARK-34806][SQL] Add Observation helper for Dataset.observe

Observation was added in 3.3.1 (this commit).

Creating Instance

Observation takes the following to be created:

  • Name (default: random UUID)

Observation is created using apply factories.

Creating Observation

apply(): Observation
apply(name: String): Observation

apply creates a Observation.