Fits a Generalised Linear Models with a LASSO (or L1) penalty, given a value of the penalty parameter.

Share:

Description

Fits a generalised linear model with a LASSO penalty, using an iteratively reweighted local linearisation approach, given a value of the penalty parameter (lamb). Can handle negative binomial family, even with overdispersion parameter unknown, as well as other GLM families.

Usage

1
2
3
glm1(y, X, lambda, family = "negative.binomial", weights = rep(1, length(y)),
     b.init = NA, phi.init = NA, phi.method = "ML", tol = c(1e-08, .Machine$double.eps),
     n.iter = 100, phi.iter = 1)

Arguments

y

A vector of values for the response variable.

X

A design matrix of p explanatory variables.

family

The family of the response variable, see family. Negative binomial with unknown overdispersion can be specified as "negative.binomial", and is the default.

lambda

The penalty parameter applied to slope parameters. Different penalties can be specified for different parameters by specifying lamb as a vector, whose length is the number of columns of X. If scalar, this penalty is applied uniformly across all parameters except for the first (assuming that it is an intercept)

weights

Observation weights. These might be useful if you want to fit a Poisson point process model...

b.init

Initial slope estimate. Must be a vector of the same length as the number of columns in X.

phi.init

Initial estimate of the negative binomial overdispersion parameter. Must be scalar.

phi.method

Method of estimating overdispersion.

tol

A vector of two values, specifying convergence tolerance, and the value to truncate fitted values at.

n.iter

Number of iterations to attempt before bailing.

phi.iter

Number of iterations estimating the negative binomial overdispersion parameter (if applicable) before returning to slope estimation. Default is one step, i.e. iterating between one-step estimates of beta and phi.

Details

This function fits a generalised linear model with a LASSO penalty, sometimes referred to as an L1 penalty or L1 norm, hence the name glm1. The model is fit using a local linearisation approach as in Osborne et al (2000), nested inside iteratively reweighted (penalised) least squares. Look it's not the fastest thing going around, try glmnet if you want something faster (and possibly rougher as an approximation). The main advantage of the glm1 function is that it has been written to accept any glm family argument (although not yet tested beyond discrete data!), and also the negative binomial distribution, which is especially useful for modelling overdispersed counts.

For negative binomial with unknown overdispersion use "negative.binomial", or if overdispersion is to be specified, use negative.binomial(theta) as in the MASS package. Note that the output refers to phi=1/theta, i.e. the overdispersion is parameterised such that the variance is mu+phi*mu^2. Hence values of phi close to zero suggest little overdispersion, values over one suggest a lot.

Value

coefficients

Vector of parameter estimates

fitted.values

Vector of predicted values (on scale of the original response)

logLs

Vector of log-likelihoods at each iteration of the model. The last entry is the log-likelihood for the final fit.

phis

Estimated overdispersion parameter at each iteration, for a negative binomial fit.

phi

Final estimate of the overdispersion parameter, for a negative binomial fit.

score

Vector of score equation values for each parameter in the model.

counter

Number of iterations until convergence. Set to Inf for a model that didn't converge.

check

Logical for whether the Kuhn-KArush-Tucker conditions are saitsfied.

Author(s)

David I. Warton <David.Warton@unsw.edu.au>, Ian W. Renner and Luke Wilson.

References

Osborne, M.R., Presnell, B. and Turlach, B.A. (2000) On the LASSO and its dual. Journal of Computational and Graphical Statistics, 9, 319-337.

See Also

glm1path, glm1, glm, family

Examples

1
2
3
4
5
6
7
8
data(spider)
Alopacce <- spider$abund[,1]
X <- cbind(1,spider$x)
#fit a LASSO-penalised negative binomial GLM, with penalty parameter 10:
ft = glm1(Alopacce,X,lambda=10)

plot(ft$logLs) # a plot of the log-likelihood, each iteration to convergence
coef(ft) # coefficients in the final model

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.