Estimators — ML Pipeline Component

An estimator is an abstraction of a learning algorithm that fits a model on a dataset.

That was so machine learning to explain an estimator this way, wasn’t it? It is that the more I spend time with Pipeline API the often I use the terms and phrases from this space. Sorry.

Technically, an Estimator produces a Model (i.e. a Transformer) for a given DataFrame and parameters (as ParamMap). It fits a model to the input DataFrame and ParamMap to produce a Transformer (a Model) that can calculate predictions for any DataFrame-based input datasets.

It is basically a function that maps a DataFrame onto a Model through fit method, i.e. it takes a DataFrame and produces a Transformer as a Model.

estimator: DataFrame =[fit]=> Model

Estimators are instances of abstract class that comes with fit method (with the return type M being a Model):

fit(dataset: DataFrame): M

Estimator is a PipelineStage and so it can be a part of a Pipeline.

Pipeline considers Estimator special and executes fit method before transform (as for other Transformer objects in a pipeline). Consult Pipeline document.