grpnet: Fit a Group Elastic Net Regularized GLM/GAM

View source: R/grpnet.R

grpnetR Documentation

Fit a Group Elastic Net Regularized GLM/GAM

Description

Fits generalized linear/additive models with a group elastic net penalty using an adaptively bounded gradient descent (ABGD) algorithm (Helwig, 2024). Predictor groups can be manually input (default S3 method) or inferred from the model (S3 "formula" method). The regularization path is computed at a data-generated (default) or user-provided sequence of lambda values.

Usage

grpnet(x, ...)

## Default S3 method:
grpnet(x, 
       y, 
       group, 
       family = c("gaussian", "binomial", "multinomial", "poisson", 
                  "negative.binomial", "Gamma", "inverse.gaussian"),
       weights = NULL, 
       offset = NULL, 
       alpha = 1, 
       nlambda = 100,
       lambda.min.ratio = ifelse(nobs < nvars, 0.05, 0.0001), 
       lambda = NULL, 
       penalty.factor = NULL,
       penalty = c("LASSO", "MCP", "SCAD"),
       gamma = 4,
       theta = 1,
       standardized = !orthogonalized,
       orthogonalized = TRUE,
       intercept = TRUE, 
       thresh = 1e-04, 
       maxit = 1e05,
       proglang = c("Fortran", "R"),
       ...)
      
## S3 method for class 'formula'
grpnet(formula,
       data, 
       use.rk = TRUE,
       family = c("gaussian", "binomial", "multinomial", "poisson", 
                  "negative.binomial", "Gamma", "inverse.gaussian"),
       weights = NULL,
       offset = NULL,
       alpha = 1,
       nlambda = 100,
       lambda.min.ratio = ifelse(nobs < nvars, 0.05, 0.0001),
       lambda = NULL,
       penalty.factor = NULL,
       penalty = c("LASSO", "MCP", "SCAD"),
       gamma = 4,
       theta = 1,
       standardized = !orthogonalized,
       orthogonalized = TRUE,
       thresh = 1e-04,
       maxit = 1e05,
       proglang = c("Fortran", "R"),
       ...)

Arguments

x

Model (design) matrix of dimension nobs by nvars (n \times p).

y

Response vector of length n. Matrix inputs are allowed for binomial and multinomial families (see "Binomial and multinomial" section).

group

Group label vector (factor, character, or integer) of length p. Predictors with the same label are grouped together for regularization.

formula

Model formula: a symbolic description of the model to be fitted. Uses the same syntax as lm and glm.

data

Optional data frame containing the variables referenced in formula.

use.rk

If TRUE (default), the rk.model.matrix function is used to build the model matrix. Otherwise, the model.matrix function is used to build the model matrix. Additional arguments to the rk.model.matrix function can be passed via the ... argument.

family

Character specifying the assumed distribution for the response variable. Partial matching is allowed. Options include "gaussian" (real-valued response), "binomial" (binary response), "multinomial" (multi-class response), "poisson" (count response), "negative.binomial" (count response), "Gamma" (positive real-valued), or "inverse.gaussian" (positive real-valued).

weights

Optional vector of length n with non-negative weights to use for weighted (penalized) likelihood estimation. Defaults to a vector of ones.

offset

Optional vector of length n with an a priori known term to be included in the model's linear predictor. Defaults to a vector of zeros.

alpha

Regularization hyperparameter satisfying 0 \leq \alpha \leq 1 that gives the balance between the group L1 (lasso) and group L2 (ridge) penalty. Setting \alpha = 1 uses a group lasso penalty, setting \alpha = 0 uses a group ridge penalty, and setting 0 < \alpha < 1 uses a group elastic net group penalty.

nlambda

Number of \lambda values to use in the regularization path. Ignored if lambda is provided.

lambda.min.ratio

The proportion 0 < \pi < 1 that defines the minimum regularization parameter \lambda_{\mathrm{min}} as a fraction of the maximum regularization parameter \lambda_{\mathrm{max}} via the relationship \lambda_{\mathrm{min}} = \pi \lambda_{\mathrm{max}}. Ignored if lambda is provided. Note that \lambda_{\mathrm{max}} is defined such that all penalized effects are shrunk to zero.

lambda

Optional vector of user-supplied regularization parameter values.

penalty.factor

Default S3 method: vector of length K giving the non-negative penalty weight for each predictor group. The order of the weights should correspond to the order of levels(as.factor(group)). Defaults to \sqrt{p_k} for all k = 1,\ldots,K, where p_k is the number of coefficients in the k-th group. If penalty.factor[k] = 0, then the k-th group is unpenalized, and the corresponding term is always included in the model.

S3 "formula" method: named list giving the non-negative penalty weight for terms specified in the formula. Incomplete lists are allowed. Any term that is specified in formula but not in penalty.factor will be assigned the default penalty weight of \sqrt{p_k}. If penalty.factor$z = 0, then the variable z is unpenalized and always included in the model.

penalty

Character specifying which (group) penalty to use: LASSO , MCP, or SCAD.

gamma

Penalty hyperparameter that satisfies \gamma > 1 for MCP and \gamma > 2 for SCAD. Ignored for LASSO penalty.

theta

Additional ("size") parameter for negative binomial responses, where the variance function is defined as V(\mu) = \mu + \mu^2/ \theta

standardized

Logical indicating whether the predictors should be groupwise standardized. If TRUE, each column of x is mean-centered and each predictor group's design matrix is scaled to have a mean-square of one before fitting the model. Regardless of whether standardization is used, the coefficients are always returned on the original data scale.

orthogonalized

Logical indicating whether the predictors should be groupwise orthogonalized. If TRUE (default), each predictor group's design matrix is orthonormalized (i.e., \mathbf{X}_k^\top \mathbf{X}_k = n \mathbf{I}_k) before fitting the model. Regardless of whether orthogonalization is used, the coefficients are always returned on the original data scale.

intercept

Logical indicating whether an intercept term should be included in the model. Note that the intercept is always unpenalized.

thresh

Convergence threshold (tolerance). The algorithm is determined to have converged once the maximum relative change in the coefficients is below this threshold. See "Convergence" section.

maxit

Maximum number of iterations to allow.

proglang

Which programming language should be used to implement the ABGD algorithm? Options include "Fortran" (default) or "R".

...

Additional arguments used by the default or formula method.

Details

Consider a generalized linear model of the form

g(\mu) = \mathbf{X}^\top \boldsymbol\beta

where \mu = E(Y | \mathbf{X}) is the conditional expectation of the response Y given the predictor vector \mathbf{X}, the function g(\cdot) is a user-specified (invertible) link function, and \boldsymbol\beta are the unknown regression coefficients. Furthermore, suppose that the predictors are grouped, such as

\mathbf{X}^\top \boldsymbol\beta = \sum_{k=1}^K \mathbf{X}_k^\top \boldsymbol\beta_k

where \mathbf{X} = (\mathbf{X}_1, \ldots, \mathbf{X}_K) is the grouped predictor vector, and \boldsymbol\beta = (\boldsymbol\beta_1, \ldots, \boldsymbol\beta_K) is the grouped coefficient vector.

Given n observations, this function finds the \boldsymbol\beta that minimizes

L(\boldsymbol\beta | \mathbf{D}) + \lambda P_\alpha(\boldsymbol\beta)

where L(\boldsymbol\beta | \mathbf{D}) is the loss function with \mathbf{D} = \{\mathbf{y}, \mathbf{X}\} denoting the observed data, P_\alpha(\boldsymbol\beta) is the group elastic net penalty, and \lambda \geq 0 is the regularization parameter.

The loss function has the form

L(\boldsymbol\beta | \mathbf{D}) = \frac{1}{n} \sum_{i=1}^n w_i \ell_i(\boldsymbol\beta | \mathbf{D}_i)

where w_i > 0 are the user-supplied weights, and \ell_i(\boldsymbol\beta | \mathbf{D}_i) is the i-th observation's contribution to the loss function. Note that \ell(\cdot) = -\log(f_Y(\cdot)) denotes the negative log-likelihood function for the given family.

The group elastic net penalty function has the form

P_\alpha(\boldsymbol\beta) = \alpha P_1(\boldsymbol\beta) + (1 - \alpha) P_2(\boldsymbol\beta)

where \alpha \in [0,1] is the user-specified alpha value,

P_1(\boldsymbol\beta) = \sum_{k=1}^K \omega_k \| \boldsymbol\beta_k \|

is the group lasso penalty with \omega_k \geq 0 denoting the k-th group's penalty.factor, and

P_2(\boldsymbol\beta) = \frac{1}{2} \sum_{k=1}^K \omega_k \| \boldsymbol\beta_k \|^2

is the group ridge penalty. Note that \| \boldsymbol\beta_k \|^2 = \boldsymbol\beta_k^\top \boldsymbol\beta_k denotes the squared Euclidean norm. When penalty %in% c("MCP", "SCAD"), the group L1 penalty P_1(\boldsymbol\beta) is replaced by the group MCP or group SCAD penalty.

Value

An object of class "grpnet" with the following elements:

call

matched call

a0

intercept sequence of length nlambda

beta

coefficient matrix of dimension nvars by nlambda

alpha

balance between the group L1 (lasso) and group L2 (ridge) penalty

lambda

sequence of regularization parameter values

family

exponential family defining the loss function

dev.ratio

proportion of (null) deviance explained for each lambda (= 1 - dev / nulldev)

nulldev

null deviance for each lambda

df

effective degrees of freedom for each lambda

nzgrp

number of non-zero groups for each lambda

nzcoef

number of non-zero coefficients for each lambda

xsd

standard deviation of x for each group

ylev

levels of response variable (only for binomial and multinomial families)

nobs

number of observations

group

group label vector

ngroups

number of groups K

npasses

number of iterations for each lambda

time

runtime in seconds to compute regularization path

offset

logical indicating if an offset was included

args

list of input argument values

formula

input formula (possibly after expansion)

term.labels

terms that appear in formula (if applicable)

rk.args

arguments for rk.model.matrix function (if applicable)

S3 "formula" method

Important: When using the S3 "formula" method, the S3 "predict" method forms the model matrix for the predictions by applying the model formula to the new data. As a result, to ensure that the corresponding S3 "predict" method works correctly, some formulaic features should be avoided.

Polynomials: When including polynomial terms, the poly function should be used with option raw = TRUE. Default use of the poly function (with raw = FALSE) will work for fitting the model, but will result in invalid predictions for new data. Polynomials can also be included via the I function, but this isn't recommended because the polynomials terms wouldn't be grouped together, i.e., the terms x and I(x^2) would be treated as two separate groups of size one instead of a single group of size two.

Splines: B-splines (and other spline bases) can be included via the S3 "formula" method. However, to ensure reasonable predictions for new data, it is necessary to specify the knots directly. For example, if x is a vector with entries between zero and one, the code bs(x, df = 5) will *not* produce valid predictions for new data, but the code bs(x, knots = c(0.25, 0.5, 0.75), Boundary.knots = c(0, 1)) will work as intended. Instead of attempting to integrate a call to bs() or rk() into the model formula, it is recommended that splines be included via the use.rk = TRUE argument.

Family argument and link functions

Unlike the glm function, the family argument of the grpnet function
* should be a character vector (not a family object)
* does not allow for specification of a link function

Currently, there is only one available link function for each family:
* gaussian (identity): \mu = \mathbf{X}^\top \boldsymbol\beta
* binomial (logit): \log(\frac{\pi}{1 - \pi}) = \mathbf{X}^\top \boldsymbol\beta
* multinomial (symmetric): \pi_\ell = \frac{\exp(\mathbf{X}^\top \boldsymbol\beta_\ell)}{\sum_{l = 1}^m \exp(\mathbf{X}^\top \boldsymbol\beta_l)}
* poisson (log): \log(\mu) = \mathbf{X}^\top \boldsymbol\beta
* negative.binomial (log): \log(\mu) = \mathbf{X}^\top \boldsymbol\beta
* Gamma (log): \log(\mu) = \mathbf{X}^\top \boldsymbol\beta
* inverse.gaussian (log): \log(\mu) = \mathbf{X}^\top \boldsymbol\beta

Binomial and multinomial

For "binomial" responses, three different possibilities exist for the input response:
1. vector coercible into a factor with two levels
2. matrix with two columns (# successes, # failures)
3. numeric vector with entries between 0 and 1
In this case, the weights argument should be used specify the total number of trials.

For "multinomial" responses, two different possibilities exist for the input reponse:
1. vector coercible into a factor with more than two levels
2. matrix of integers (counts) for each category level

Convergence

The algorithm is determined to have converged once

\max_j \frac{| \beta_j - \beta_j^{\mathrm{old}} |}{1 + |\beta_j^{\mathrm{old}}|} < \epsilon

where j \in \{1,\ldots,p\} and \epsilon is the thresh argument.

Note

The syntax of (the default S3 method for) this function closely mimics that of the glmnet function in the glmnet package (Friedman, Hastie, & Tibshirani, 2010).

Author(s)

Nathaniel E. Helwig <helwig@umn.edu>

References

Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1), 1-22. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.18637/jss.v033.i01")}

Helwig, N. E. (2024). Versatile descent algorithms for group regularization and variable selection in generalized linear models. Journal of Computational and Graphical Statistics. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1080/10618600.2024.2362232")}

See Also

plot.grpnet for plotting the regularization path

predict.grpnet for predicting from grpnet objects

cv.grpnet for k-fold cross-validation of lambda

Examples

######***######   family = "gaussian"   ######***######

# load data
data(auto)

# fit model (formula method, response = mpg)
mod <- grpnet(mpg ~ ., data = auto)

# print regularization path info
mod

# plot coefficient paths
plot(mod)



######***######   family = "binomial"   ######***######

# load data
data(auto)

# redefine origin (Domestic vs Foreign)
auto$origin <- ifelse(auto$origin == "American", "Domestic", "Foreign")

# fit model (formula method, response = origin with 2 levels)
mod <- grpnet(origin ~ ., data = auto, family = "binomial")

# print regularization path info
mod

# plot coefficient paths
plot(mod)



######***######   family = "multinomial"   ######***######

# load data
data(auto)

# fit model (formula method, response = origin with 3 levels)
mod <- grpnet(origin ~ ., data = auto, family = "multinomial")

# print regularization path info
mod

# plot coefficient paths
plot(mod)



######***######   family = "poisson"   ######***######

# load data
data(auto)

# fit model (formula method, response = horsepower)
mod <- grpnet(horsepower ~ ., data = auto, family = "poisson")

# print regularization path info
mod

# plot coefficient paths
plot(mod)



######***######   family = "negative.binomial"   ######***######

# load data
data(auto)

# fit model (formula method, response = horsepower)
mod <- grpnet(horsepower ~ ., data = auto, family = "negative.binomial")

# print regularization path info
mod

# plot coefficient paths
plot(mod)



######***######   family = "Gamma"   ######***######

# load data
data(auto)

# fit model (formula method, response = mpg)
mod <- grpnet(mpg ~ ., data = auto, family = "Gamma")

# print regularization path info
mod

# plot coefficient paths
plot(mod)



######***######   family = "inverse.gaussian"   ######***######

# load data
data(auto)

# fit model (formula method, response = mpg)
mod <- grpnet(mpg ~ ., data = auto, family = "inverse.gaussian")

# print regularization path info
mod

# plot coefficient paths
plot(mod)


grpnet documentation built on Oct. 12, 2024, 1:07 a.m.

Related to grpnet in grpnet...