GLM: Generalized linear model

Description Usage Arguments Value Class Methods Details Examples

Description

Fit a generalized linear model, possibly with L1 or elastic net regularization. Supports linear regression for continous outputs, poisson regression for count outputs, logistic and multinomial regression for classification. Currently not supports cox regression model.

Usage

1
2
3
4
5
6
7
linear_regression(intercept = TRUE, standardize = TRUE,
    offset_index = 0, alpha = 1, lambda = 0.01, choose_lambda = FALSE,
    lambda_candidates = NULL, lambda_min_ratio = 0.0001, nlambda = 100,
    nfolds = 10, parallel = FALSE, loss = 'deviance',
    group_multinom = FALSE, modified_newton = FALSE, standardize_response = FALSE,
    lower = -Inf, upper = +Inf, max_included = NULL, force_exclude=integer(0),
    tol = 1e-7, maxit = 1e+5)
1
2
3
4
5
6
7
logistic_regression(intercept = TRUE, standardize = TRUE,
    offset_index = 0, alpha = 1, lambda = 0.01, choose_lambda = FALSE,
    lambda_candidates = NULL, lambda_min_ratio = 0.0001, nlambda = 100,
    nfolds = 10, parallel = FALSE, loss = 'deviance',
    group_multinom = FALSE, modified_newton = FALSE, standardize_response = FALSE,
    lower = -Inf, upper = +Inf, max_included = NULL, force_exclude=integer(0),
    tol = 1e-7, maxit = 1e+5)
1
2
3
4
5
6
7
poisson_regression(intercept = TRUE, standardize = TRUE,
    offset_index = 0, alpha = 1, lambda = 0.0001, choose_lambda = FALSE,
    lambda_candidates = NULL, lambda_min_ratio = 0.01, nlambda = 100,
    nfolds = 10, parallel = FALSE, loss = 'deviance',
    group_multinom = FALSE, modified_newton = FALSE, standardize_response = FALSE,
    lower = -Inf, upper = +Inf, max_included = NULL, force_exclude=integer(0),
    tol = 1e-7, maxit = 1e+5)
1
2
3
4
5
6
7
multinomial_regression(intercept = TRUE, standardize = TRUE,
    offset_index = 0, alpha = 1, lambda = 0.0001, choose_lambda = FALSE,
    lambda_candidates = NULL, lambda_min_ratio = 0.01, nlambda = 100,
    nfolds = 10, parallel = FALSE, loss = 'deviance',
    group_multinom = FALSE, modified_newton = FALSE, standardize_response = FALSE,
    lower = -Inf, upper = +Inf, max_included = NULL, force_exclude=integer(0),
    tol = 1e-7, maxit = 1e+5)
1
2
3
4
5
6
7
linear_regression_multi(intercept = TRUE, standardize = TRUE,
    offset_index = 0, alpha = 1, lambda = 0.01, choose_lambda = FALSE,
    lambda_candidates = NULL, lambda_min_ratio = 0.0001, nlambda = 100,
    nfolds = 10, parallel = FALSE, loss = 'deviance',
    group_multinom = FALSE, modified_newton = FALSE, standardize_response = FALSE,
    lower = -Inf, upper = +Inf, max_included = NULL, force_exclude=integer(0),
    tol = 1e-7, maxit = 1e+5)

Arguments

intercept

logical indicating if the model has constant term

standardize

lgical indicating if the explanatory variables should be standardized before fitted

offset_index

integer, specifies which column of x is used as the offset. This is often used for poisson regression, to accomodate the difference in the periods across observations. Note that family=='multinomial' requires multiple offset variables.

alpha

numeric value between 0 and 1, specifies the weight for the L1 regularization versus L2. alpha=1 means L1 regularization, whle alpha=0 is L2 regularization.

lambda

initial values of lambda, possibly multiple values. if NULL, automatically chosen

choose_lambda

if TRUE, lambda is chosen by cross validation when fitted

lambda_candidates

numeric vector, if specified, used as the lambda values with which models are fitted. if NULL, automatically chosen

lambda_min_ratio

smallest value of lambda as a fraction to the maximum, which is computed automatically.

nlambda

number of lambda values evaluated

nfolds

number of cross validation folds, used if choose_lambda=TRUE

parallel

logical indicating parallel computation when conducting cross validation, used if choose_lambda=TRUE

loss

loss function used for cross validation. either 'deviance', 'class', 'auc', 'mse', or 'mae', used if choose_lambda=TRUE

modified_newton

if TRUE, uses an upper bound on the hessian instead of the exact

group_multinom

if TRUE, uses group lasso for a variable in multinomial model

standardize_response

if TRUE, output variables are standardized in multiple gaussian model

tol

numeric value of convergence criterion

maxit

maximum number of iteration

Value

GLM class object

Class Methods

fit(x, y)

fit the model

predict(x, ...)

return predicted values

incr_fit(x, y)

not implemented

predict_proba(x, ...)

return probability prediction

get_coef(lambda = NULL, nonzero_only = FALSE)

return the coefficients

mse(x, y)

return the mean squared error

cross_entropy(x, y)

return the cross entropy loss if appropriate

accuracy(x, y)

return the classification accuracy if appropriate

Details

uses glmnet and cv.glmnet as the backend

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
set.seed(123)
x <- matrix(rnorm(100*20),100,20)
y <- rnorm(100)
g <- linear_regression()
g$fit(x, y)
cor(y, g$predict(x))
g$mse(x, y)

# setting lambda=0 is equivalent to lm
g <- linear_regression(lambda=0)
g$fit(x, y)
lmfit <- lm(y ~ x)
cbind(g$get_coef(), coefficients(lmfit))

# logistic regression
y <- sample(0:1, 100, replace=TRUE)
g <- logistic_regression()
g$fit(x, y)
table(g$predict(x), y)
g$accuracy(x, y)
g$cross_entropy(x, y)
# y can be factor or character
y <- sample(c('A', 'B'), 100, replace=TRUE)
g$fit(x, y)
table(g$predict(x), y)
g$accuracy(x, y)
g$cross_entropy(x, y)

# multinomial regression
y <- sample(c('l', 'm', 's'), 100, replace=TRUE)
g <- multinomial_regression()
g$fit(x, y)
table(g$predict(x), y)
g$accuracy(x, y)
g$cross_entropy(x, y)

# poisson regression
y <- sample(0:5, 100, prob=c(2,3,3,2,1,1), replace=TRUE)
g <- poisson_regression()
g$fit(x, y)
cor(y, g$predict(x))

kota7/MLPipe documentation built on May 5, 2019, 5:53 p.m.