Evaluator — ML Pipeline Component for Model Scoring

Evaluator is the contract in Spark MLlib for ML Pipeline components that can evaluate models for given parameters.

ML Pipeline evaluators are transformers that take DataFrames and compute metrics indicating how good a model is.

evaluator: DataFrame =[evaluate]=> Double

Evaluator is used to evaluate models and is usually (if not always) used for best model selection by CrossValidator and TrainValidationSplit.

Evaluator uses isLargerBetter method to indicate whether the Double metric should be maximized (true) or minimized (false). It considers a larger value better (true) by default.

Table 1. Evaluators
Evaluator Description


Evaluator of binary classification models


Evaluator of clustering models


Evaluator of multiclass classification models


Evaluator of regression models

Evaluating Model Output with Extra Parameters — evaluate Method

evaluate(dataset: Dataset[_], paramMap: ParamMap): Double

evaluate copies the extra paramMap and evaluates a model output.

evaluate is used…​FIXME

Evaluator Contract

package org.apache.spark.ml.evaluation

abstract class Evaluator {
  def evaluate(dataset: Dataset[_]): Double
  def copy(extra: ParamMap): Evaluator
  def isLargerBetter: Boolean = true
Table 2. Evaluator Contract
Method Description


Used when…​


Used when…​


Indicates whether the metric returned by evaluate should be maximized (true) or minimized (false).

Gives true by default.