fit a negative binomial model with lasso (or elastic net), snet and mnet regularization

Description

Fit a negative binomial linear model via penalized maximum likelihood. The regularization path is computed for the lasso (or elastic net penalty), snet and mnet penalty, at a grid of values for the regularization parameter lambda.

Usage

1
2
3
4
5
6
7
glmregNB(formula, data, weights, nlambda = 100, lambda=NULL, lambda.min.ratio = 
ifelse(nobs<nvars,0.05,0.001), alpha=1, gamma=3, rescale=TRUE, standardize = TRUE, 
penalty.factor = rep(1, nvars), thresh = 0.001, maxit.theta = 25, maxit=1000, 
eps=.Machine$double.eps, trace=FALSE, start = NULL, etastart = NULL, mustart = NULL, 
theta.est=TRUE, theta0=NULL, init.theta=ifelse(theta.est, theta0[1],NULL),link=log, 
penalty=c("enet","mnet","snet"), method="glmreg_fit", model=TRUE, 
x.keep=FALSE, y.keep=TRUE, contrasts=NULL, convex=FALSE, ...)

Arguments

formula

formula used to describe a model.

data

argument controlling formula processing via model.frame.

weights

observation weights. Default is 1 for each observation

nlambda

The number of lambda values - default is 100.

lambda

A user supplied lambda sequence

lambda.min.ratio

Smallest value for lambda, as a fraction of lambda.max, the (data derived) entry value (i.e. the smallest value for which all coefficients are zero). The default depends on the sample size nobs relative to the number of variables nvars. If nobs > nvars, the default is 0.001, close to zero. If nobs < nvars, the default is 0.05.

alpha

The L2 penalty mixing parameter, with 0≤α≤ 1. alpha=1 is lasso (mcp, scad) penalty; and alpha=0 the ridge penalty.

gamma

The tuning parameter of the snet or mnet penalty.

rescale

logical value, if TRUE, adaptive rescaling of the penalty parameter for penalty="mnet" or penalty="snet" with family other than "gaussian". See reference

standardize

Logical flag for x variable standardization, prior to fitting the model sequence. The coefficients are always returned on the original scale. Default is standardize=TRUE. If variables are in the same units already, you might not wish to standardize.

penalty.factor

This is a number that multiplies lambda to allow differential shrinkage of coefficients. Can be 0 for some variables, which implies no shrinkage, and that variable is always included in the model. Default is same shrinkage for all variables.

thresh

Convergence threshold for coordinate descent. Defaults value is 1e-6.

maxit.theta

Maximum number of iterations for estimating theta scaling parameter

maxit

Maximum number of coordinate descent iterations for each lambda value; default is 1000.

eps

If a number is less than eps in magnitude, then this number is considered as 0

trace

If TRUE, fitting progress is reported

start, etastart, mustart, ...

arguments for the link{glmreg} function

init.theta

initial scaling parameter theta

theta.est

Estimate scale parameter theta? Default is TRUE. Note, the algorithm may become slow. In this case, one may use glmreg function with family="negbin", and a fixed theta

.

theta0

initial scale parameter vector theta, with length nlambda if theta.est=FALSE. Default is NULL

convex

Calculate index for which objective function ceases to be locally convex? Default is FALSE and only useful if penalty="mnet" or "snet".

link

link function, default is log

penalty

Type of regularization

method

estimation method

model, x.keep, y.keep

logicals. If TRUE the corresponding components of the fit (model frame, response, model matrix) are returned.

contrasts

the contrasts corresponding to levels from the respective models

Details

The sequence of models implied by lambda is fit by coordinate descent. This is a lasso (mcp, scad) or elastic net (mnet, snet) regularization path for fitting the negative binomial linear regression paths, by maximizing the penalized log-likelihood. Note that the objective function is

-∑ (weights * loglik) + λ*penalty

if standardize=FALSE and

-\frac{weights}{∑(weights)} * loglik + λ*penalty

if standardize=TRUE.

Value

An object with S3 class "glmreg", "glmregNB" for the various types of models.

call

the call that produced the model fit

b0

Intercept sequence of length length(lambda)

beta

A nvars x length(lambda) matrix of coefficients.

lambda

The actual sequence of lambda values used

dev

The computed deviance. The deviance calculations incorporate weights if present in the model. The deviance is defined to be 2*(loglike_sat - loglike), where loglike_sat is the log-likelihood for the saturated model (a model with a free parameter per observation).

nulldev

Null deviance (per observation). This is defined to be 2*(loglike_sat -loglike(Null)); The NULL model refers to the intercept model.

nobs

number of observations

Author(s)

Zhu Wang <zwang@connecticutchildrens.org>

References

Breheny, P. and Huang, J. (2011) Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Ann. Appl. Statist., 5: 232-253.

Zhu Wang, Shuangge Ma, Michael Zappitelli, Chirag Parikh, Ching-Yun Wang and Prasad Devarajan (2014) Penalized Count Data Regression with Application to Hospital Stay after Pediatric Cardiac Surgery, Statistical Methods in Medical Research. 2014 Apr 17. [Epub ahead of print]

See Also

print, predict, coef and plot methods, and the cv.glmregNB function.

Examples

1
2
3
4
5
6
7
8
9
## Not run: 
data("bioChemists", package = "pscl")
fm_nb <- glmregNB(art ~ ., data = bioChemists)
coef(fm_nb)
### ridge regression
fm <- glmregNB(art ~ ., alpha=0, data = bioChemists, lambda=seq(0.001, 1, by=0.01))
fm <- cv.glmregNB(art ~ ., alpha=0, data = bioChemists, lambda=seq(0.001, 1, by=0.01))

## End(Not run)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.