l0ara: fit a generalized linear model with l0 penalty
In tomwenseleers/l0ara: Sparse Generalized Linear Model with L0 Approximation for Feature Selection

View source: R/l0ara.R

l0ara

R Documentation

fit a generalized linear model with l0 penalty

Description

An adaptive ridge algorithm for feature selection with L0 penalty.

Usage

l0ara(x, y, family, lam, nonneg, standardize, maxit, eps)

Arguments

`x`	Design matrix of dimension nobs x nvars; each row is an observation vector and each column is a variable/feature. The design matrix should be sparse (of class dgCMatrix), and if it is not should be coercable to that class. Regular matrices will be coerced to sparse matrices. Formula's will be converted to design matrices and coerced to a sparse matrix.
`y`	Response variable. Quantitative for `family="gaussian"`; positive quantitative for `family="gamma"` or `family="inv.gaussian"` ; a factor with two levels for `family="logit"`; non-negative counts for `family="poisson"`.
`family`	Response type(see above).
`lam`	A user supplied `lambda` value. If you have a `lam` sequence, use `cv.l0ara` first to select optimal tunning and then refit with `lam.min` . To use AIC, set `lam=2`; to use BIC, set `lam=log(n)`.
`nonneg`	Boolean vector with length equal to the number of columns of x specifying which coefficients should be constrained to be nonnegative.
`standardize`	Logical flag for data normalization. If `standardize=TRUE`, independent variables in the design matrix `x` will be standardized with mean 0 and standard deviation 1; default value is `FALSE`.
`maxit`	Maximum number of passes over the data for `lambda`. Default value is `1e3`.
`eps`	Convergence threshold. Default value is `1e-4`.

Details

The sequence of models indexed by the parameter lambda is fit using adptive ridge algorithm. The objective function for generalized linear models (including family above) is defined to be

-(log likelihood)+(λ/2)*|β|_0

|β|_0 is the number of non-zero elements in β. To select the "best" model with AIC or BIC criterion, let lambda to be 2 or log(n). This adaptive ridge algorithm is developed to approximate L0 penalized generalized linear models with sequential optimization and is efficient for high-dimensional data.

Value

An object with S3 class "l0ara" containing:

`beta`	A vector of coefficients
`df`	Number of nonzero coefficients
`iter`	Number of iterations
`lambda`	The lambda used
`x`	Design matrix
`y`	Response variable

Author(s)

Wenchuan Guo <wguo007@ucr.edu>, Shujie Ma <shujie.ma@ucr.edu>, Zhenqiu Liu <Zhenqiu.Liu@cshs.org>

Examples

# Linear regression
# Generate design matrix and response variable
n <- 100
p <- 40
x <- matrix(rnorm(n*p), n, p)
beta <- c(1,0,2,3,rep(0,p-4))
noise <- rnorm(n)
y <- x%*%beta+noise
# fit sparse linear regression using BIC 
res.gaussian <- l0ara(x, y, family="gaussian", log(n))

# predict for new observations
print(res.gaussian)
predict(res.gaussian, newx=matrix(rnorm(3,p),3,p))
coef(res.gaussian)

# Logistic regression
# Generate design matrix and response variable
n <- 100
p <- 40
x <- matrix(rnorm(n*p), n, p)
beta <- c(1,0,2,3,rep(0,p-4))
prob <- exp(x%*%beta)/(1+exp(x%*%beta))
y <- rbinom(n, rep(1,n), prob)
# fit sparse logistic regression
res.logit <- l0ara(x, y, family="logit", 0.7)

# predict for new observations
print(res.logit)
predict(res.logit, newx=matrix(rnorm(3,p),3,p))
coef(res.logit)

# Poisson regression
# Generate design matrix and response variable
n <- 100
p <- 40
x <- matrix(rnorm(n*p), n, p)
beta <- c(1,0,0.5,0.3,rep(0,p-4))
mu <- exp(x%*%beta)
y <- rpois(n, mu)
# fit sparse Poisson regression using AIC
res.pois <- l0ara(x, y, family="poisson", 2)

# predict for new observations
print(res.pois)
predict(res.pois, newx=matrix(rnorm(3,p),3,p))
coef(res.pois)

tomwenseleers/l0ara documentation built on Sept. 21, 2022, 5:20 p.m.