islasso: The Induced Smoothed lasso

View source: R/islasso.R

islassoR Documentation

The Induced Smoothed lasso

Description

islasso is used to fit lasso regression models wherein the nonsmooth L_1 norm penalty is replaced by a smooth approximation justified under the induced smoothing paradigm. Simple lasso-type or elastic-net penalties are permitted and Linear, Logistic, Poisson and Gamma responses are allowed.

Usage

islasso(formula, family = gaussian, lambda, alpha = 1, data, weights, subset,
        offset, unpenalized, contrasts = NULL, control = is.control())

Arguments

formula

an object of class “formula” (or one that can be coerced to that class): the ‘usual’ symbolic description of the model to be fitted.

family

the assumed response distribution. Gaussian, (quasi) Binomial, (quasi) Poisson, and Gamma are allowed. family=gaussian is implemented with identity link, family=binomial is implemented with logit or probit links, family=poisson is implemented with log link, and family=Gamma is implemented with inverse, log and identity links.

lambda

Value of the tuning parameter in the objective. If missing, the optimal lambda is computed using cv.glmnet.

alpha

The elastic-net mixing parameter, with 0\le\alpha\le 1. The penalty is defined as

(1-\alpha)/2||\beta||_2^2+\alpha||\beta||_1.

alpha=1 is the lasso penalty, and alpha=0 the ridge penalty.

data

an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which islasso is called.

weights

observation weights. Default is 1 for each observation.

subset

an optional vector specifying a subset of observations to be used in the fitting process.

offset

this can be used to specify an a priori known component to be included in the linear predictor during fitting. This should be NULL or a numeric vector of length equal to the number of cases.

unpenalized

optional. A vector of integers or characters indicating any covariate (in the formula) with coefficients not to be penalized. The intercept, if included in the model, is always unpenalized.

contrasts

an optional list. See the contrasts.arg of model.matrix.default.

control

a list of parameters for controlling the fitting process (see islasso.control for more details).

Details

islasso estimates regression models by imposing a lasso-type penalty on some or all regression coefficients. However the nonsmooth L_1 norm penalty is replaced by a smooth approximation justified under the induced smoothing paradigm. The advantage is that reliable standard errors are returned as model output and hypothesis testing on linear combinantions of the regression parameters can be carried out straightforwardly via the Wald statistic. Simulation studies provide evidence that the proposed approach controls type-I errors and exhibits good power in different scenarios.

Value

A list of

coefficients

a named vector of coefficients

se

a named vector of standard errors

residuals

the working residuals

fitted.values

the fitted values

rank

the estimated degrees of freedom

family

the family object used

linear.predictors

the linear predictors

deviance

the family deviance

aic

the Akaike Information Criterion

null.deviance

the family null deviance

iter

the number of iterations of IWLS used

weights

the working weights, that is the weights in the final iteration of the IWLS fit

df.residual

the residual degrees of freedom

df.null

the degrees of freedom of a null model

converged

logical. Was the IWLS algorithm judged to have converged?

model

if requested (the default), the model frame used.

call

the matched call

formula

the formula supplied

terms

the terms object used

data

he data argument.

offset

the offset vector used.

control

the value of the control argument used

xlevels

(where relevant) a record of the levels of the factors used in fitting.

lambda

the lambda value used in the islasso algorithm

alpha

the elasticnet mixing parameter

dispersion

the estimated dispersion parameter

internal

internal elements

contrasts

(only where relevant) the contrasts used.

Author(s)

The main function of the same name was inspired by the R function previously implemented by Vito MR Muggeo.

Maintainer: Gianluca Sottile <gianluca.sottile@unipa.it>

References

Cilluffo, G, Sottile, G, S, La Grutta, S and Muggeo, VMR (2019). The Induced Smoothed lasso: A practical framework for hypothesis testing in high dimensional regression. Statistical Methods in Medical Research, DOI: 10.1177/0962280219842890.

Sottile, G, Cilluffo, G, Muggeo, VMR (2019). The R package islasso: estimation and hypothesis testing in lasso regression. Technical Report on ResearchGate. doi:10.13140/RG.2.2.16360.11521.

See Also

islasso.fit, summary.islasso, residuals.islasso, logLik.islasso, predict.islasso and deviance.islasso methods.

Examples


set.seed(1)
n <- 100
p <- 100
p1 <- 10  #number of nonzero coefficients
coef.veri <- sort(round(c(seq(.5, 3, l=p1/2), seq(-1, -2, l=p1/2)), 2))
sigma <- 1

coef <- c(coef.veri, rep(0, p-p1))

X <- matrix(rnorm(n*p), n, p)
eta <- drop(X%*%coef)

##### gaussian ######
mu <- eta
y <- mu + rnorm(n, 0, sigma)

o <- islasso(y ~ ., data = data.frame(y = y, X), 
             family = gaussian())
o
summary(o)
coef(o)
fitted(o)
predict(o, type="response")
plot(o)
residuals(o)
deviance(o)
AIC(o)
logLik(o)

## Not run: 
# for the interaction
o <- islasso(y ~ X1 * X2, data = data.frame(y = y, X), 
             family = gaussian())

##### binomial ######
coef <- c(c(1,1,1), rep(0, p-3))
X <- matrix(rnorm(n*p), n, p)
eta <- drop(cbind(1, X)%*%c(-1, coef))
mu <- binomial()$linkinv(eta)
y <- rbinom(n, 100, mu)
y <- cbind(y, 100-y)

o <- islasso(cbind(y1, y2) ~ ., 
             data = data.frame(y1 = y[,1], y2 = y[,2], X), 
             family = binomial())
summary(o, pval = .05)

##### poisson ######
coef <- c(c(1,1,1), rep(0, p-3))
X <- matrix(rnorm(n*p), n, p)
eta <- drop(cbind(1, X)%*%c(1, coef))
mu <- poisson()$linkinv(eta)
y <- rpois(n, mu)

o <- islasso(y ~ ., data = data.frame(y = y, X), 
             family = poisson())
summary(o, pval = .05)

##### Gamma ######
coef <- c(c(1,1,1), rep(0, p-3))
X <- matrix(rnorm(n*p), n, p)
eta <- drop(cbind(1, X)%*%c(-1, coef))
mu <- Gamma(link="log")$linkinv(eta)
shape <- 10
phi <- 1 / shape
y <- rgamma(n, scale = mu / shape, shape = shape)

o <- islasso(y ~ ., data = data.frame(y = y, X), 
             family = Gamma(link = "log"))
summary(o, pval = .05)

## End(Not run)

islasso documentation built on May 31, 2023, 8:37 p.m.