GeneralizedLinearRegression (GLM)

GeneralizedLinearRegression is a regression algorithm. It supports the following error distribution families:

  1. gaussian

  2. binomial

  3. poisson

  4. gamma

GeneralizedLinearRegression supports the following relationship between the linear predictor and the mean of the distribution function links:

  1. identity

  2. logit

  3. log

  4. inverse

  5. probit

  6. cloglog

  7. sqrt

GeneralizedLinearRegression supports 4096 features.

The label column has to be of DoubleType type.

GeneralizedLinearRegression belongs to org.apache.spark.ml.regression package.
import org.apache.spark.ml.regression._
val glm = new GeneralizedLinearRegression()

import org.apache.spark.ml.linalg._
val features = Vectors.sparse(5, Seq((3,1.0)))
val trainDF = Seq((0, features, 1)).toDF("id", "features", "label")
val glmModel = glm.fit(trainDF)

GeneralizedLinearRegression is a Regressor with features of Vector type that can train a GeneralizedLinearRegressionModel.

GeneralizedLinearRegressionModel

Regressor

Regressor is a custom Predictor.