Predictor is an Estimator for a PredictionModel with its own abstract train method.

train(dataset: DataFrame): M

The train method is supposed to ease dealing with schema validation and copying parameters to a trained PredictionModel model. It also sets the parent of the model to itself.

A Predictor is basically a function that maps a DataFrame onto a PredictionModel.

predictor: DataFrame =[train]=> PredictionModel

It implements the abstract fit(dataset: DataFrame) of the Estimator abstract class that validates and transforms the schema of a dataset (using a custom transformSchema of PipelineStage), and then calls the abstract train method.

Validation and transformation of a schema (using transformSchema) makes sure that:

  1. features column exists and is of correct type (defaults to Vector).

  2. label column exists and is of Double type.

As the last step, it adds the prediction column of Double type.