GLM: Generalized linear model
In kota7/MLPipe: Machine Learning Pipeline

Description Usage Arguments Value Class Methods Details Examples

Fit a generalized linear model, possibly with L1 or elastic net regularization. Supports linear regression for continous outputs, poisson regression for count outputs, logistic and multinomial regression for classification. Currently not supports cox regression model.

linear_regression(intercept = TRUE, standardize = TRUE,
    offset_index = 0, alpha = 1, lambda = 0.01, choose_lambda = FALSE,
    lambda_candidates = NULL, lambda_min_ratio = 0.0001, nlambda = 100,
    nfolds = 10, parallel = FALSE, loss = 'deviance',
    group_multinom = FALSE, modified_newton = FALSE, standardize_response = FALSE,
    lower = -Inf, upper = +Inf, max_included = NULL, force_exclude=integer(0),
    tol = 1e-7, maxit = 1e+5)

logistic_regression(intercept = TRUE, standardize = TRUE,
    offset_index = 0, alpha = 1, lambda = 0.01, choose_lambda = FALSE,
    lambda_candidates = NULL, lambda_min_ratio = 0.0001, nlambda = 100,
    nfolds = 10, parallel = FALSE, loss = 'deviance',
    group_multinom = FALSE, modified_newton = FALSE, standardize_response = FALSE,
    lower = -Inf, upper = +Inf, max_included = NULL, force_exclude=integer(0),
    tol = 1e-7, maxit = 1e+5)

poisson_regression(intercept = TRUE, standardize = TRUE,
    offset_index = 0, alpha = 1, lambda = 0.0001, choose_lambda = FALSE,
    lambda_candidates = NULL, lambda_min_ratio = 0.01, nlambda = 100,
    nfolds = 10, parallel = FALSE, loss = 'deviance',
    group_multinom = FALSE, modified_newton = FALSE, standardize_response = FALSE,
    lower = -Inf, upper = +Inf, max_included = NULL, force_exclude=integer(0),
    tol = 1e-7, maxit = 1e+5)

multinomial_regression(intercept = TRUE, standardize = TRUE,
    offset_index = 0, alpha = 1, lambda = 0.0001, choose_lambda = FALSE,
    lambda_candidates = NULL, lambda_min_ratio = 0.01, nlambda = 100,
    nfolds = 10, parallel = FALSE, loss = 'deviance',
    group_multinom = FALSE, modified_newton = FALSE, standardize_response = FALSE,
    lower = -Inf, upper = +Inf, max_included = NULL, force_exclude=integer(0),
    tol = 1e-7, maxit = 1e+5)

linear_regression_multi(intercept = TRUE, standardize = TRUE,
    offset_index = 0, alpha = 1, lambda = 0.01, choose_lambda = FALSE,
    lambda_candidates = NULL, lambda_min_ratio = 0.0001, nlambda = 100,
    nfolds = 10, parallel = FALSE, loss = 'deviance',
    group_multinom = FALSE, modified_newton = FALSE, standardize_response = FALSE,
    lower = -Inf, upper = +Inf, max_included = NULL, force_exclude=integer(0),
    tol = 1e-7, maxit = 1e+5)

intercept: logical indicating if the model has constant term
standardize: lgical indicating if the explanatory variables should be standardized before fitted
offset_index: integer, specifies which column of x is used as the offset. This is often used for poisson regression, to accomodate the difference in the periods across observations. Note that family=='multinomial' requires multiple offset variables.
alpha: numeric value between 0 and 1, specifies the weight for the L1 regularization versus L2. alpha=1 means L1 regularization, whle alpha=0 is L2 regularization.
lambda: initial values of lambda, possibly multiple values. if NULL, automatically chosen
choose_lambda: if TRUE, lambda is chosen by cross validation when fitted
lambda_candidates: numeric vector, if specified, used as the lambda values with which models are fitted. if NULL, automatically chosen
lambda_min_ratio: smallest value of lambda as a fraction to the maximum, which is computed automatically.
nlambda: number of lambda values evaluated
nfolds: number of cross validation folds, used if choose_lambda=TRUE
parallel: logical indicating parallel computation when conducting cross validation, used if choose_lambda=TRUE
loss: loss function used for cross validation. either 'deviance', 'class', 'auc', 'mse', or 'mae', used if choose_lambda=TRUE
modified_newton: if TRUE, uses an upper bound on the hessian instead of the exact
group_multinom: if TRUE, uses group lasso for a variable in multinomial model
standardize_response: if TRUE, output variables are standardized in multiple gaussian model
tol: numeric value of convergence criterion
maxit: maximum number of iteration

GLM class object

fit(x, y): fit the model
predict(x, ...): return predicted values
incr_fit(x, y): not implemented
predict_proba(x, ...): return probability prediction
get_coef(lambda = NULL, nonzero_only = FALSE): return the coefficients
mse(x, y): return the mean squared error
cross_entropy(x, y): return the cross entropy loss if appropriate
accuracy(x, y): return the classification accuracy if appropriate

uses glmnet and cv.glmnet as the backend

set.seed(123)
x <- matrix(rnorm(100*20),100,20)
y <- rnorm(100)
g <- linear_regression()
g$fit(x, y)
cor(y, g$predict(x))
g$mse(x, y)

# setting lambda=0 is equivalent to lm
g <- linear_regression(lambda=0)
g$fit(x, y)
lmfit <- lm(y ~ x)
cbind(g$get_coef(), coefficients(lmfit))

# logistic regression
y <- sample(0:1, 100, replace=TRUE)
g <- logistic_regression()
g$fit(x, y)
table(g$predict(x), y)
g$accuracy(x, y)
g$cross_entropy(x, y)
# y can be factor or character
y <- sample(c('A', 'B'), 100, replace=TRUE)
g$fit(x, y)
table(g$predict(x), y)
g$accuracy(x, y)
g$cross_entropy(x, y)

# multinomial regression
y <- sample(c('l', 'm', 's'), 100, replace=TRUE)
g <- multinomial_regression()
g$fit(x, y)
table(g$predict(x), y)
g$accuracy(x, y)
g$cross_entropy(x, y)

# poisson regression
y <- sample(0:5, 100, prob=c(2,3,3,2,1,1), replace=TRUE)
g <- poisson_regression()
g$fit(x, y)
cor(y, g$predict(x))