An Introduction to islasso
In islasso: The Induced Smoothed Lasso

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Abstract

In this short note we present and briefly discuss the R package islasso dealing with regression models having a large number of covariates. Estimation is carried out by penalizing the coefficients via a quasi-lasso penalty, wherein the nonsmooth lasso penalty is replaced by its smooth counterpart determined iteratively by data according to the induced smoothing idea. The package includes functions to estimate the model and to test for linear hypothesis on linear combinations of relevant coefficients. We illustrate R code throughout a worked example, by avoiding intentionally to report details and extended bibliography.

Introduction

Let $\mathbf{y} = \mathbf{X}\beta + \mathbf{\epsilon}$ be the linear model of interest with usual zero-means and homoscedastic errors. As usual, $\mathbf{y} = (y_1,\ldots,y_n)^T$ is the response vector, $\mathbf{X}$ is the $n \times p$ design matrix (having $p$ quite large) with regression coefficients $\mathbf{\beta}$. When interest lies in selecting the non-noise covariates and estimating the relevant effect, one assumes the lasso penalized objective function (Tibshirani, 1996), $$\frac{1}{2}||\mathbf{y}-\mathbf{X}\mathbf{\beta}||_2^2+\lambda||\mathbf{\beta}||_1$$

The R functions

The main function of the package are islasso() where the user supplies the model formula as in the usual lm or glm functions, i.e.

islasso(formula, family = gaussian, lambda, alpha = 1, data, weights, subset,
        offset, unpenalized, contrasts = NULL, control = is.control())

and islasso.path used to fit the regularization path via the induced smoothed lasso framework, i.e.

islasso.path(formula, family = gaussian, lambda = NULL, nlambda = 100, 
             lambda.min.ratio = ifelse(nobs < nvars, 1E-3, 1E-05), alpha = 1, 
             data, weights, subset, offset, unpenalized, contrasts = NULL, control = is.control())

family accepts specification of family and link function as in Table 1, lambda is the tuning parameter, alpha is elastic-net mixing parameter, nlambda is the number of lambda values, lambda.min.ratio is the smallest value for lambda (as a fraction of lambda.max), and unpenalized allows to indicate covariates with unpenalized coefficients.

Table 1. Families and link functions allowed in islasso

| family | link | |:-------|:----:| | gaussian | identity | | binomial | logit, probit | | poisson | log | | gamma | identity, log, inverse |

The fitter functions are \verb|islasso.fit()| and \verb|islasso.path.fit()| which reads as

islasso.fit(X, y, family = gaussian(), lambda, alpha = 1, intercept = FALSE, 
            weights = NULL, offset = NULL, unpenalized = NULL, control = is.control())

and

islasso.path.fit(X, y, family = gaussian(), lambda, nlambda, lambda.min.ratio, 
                 alpha = 1, intercept = FALSE, weights = NULL, offset = NULL, 
                 unpenalized = NULL, control = is.control())

whose actually implements the estimating algorithm as described in the paper. The lambda argument in islasso.fit and islasso specifies the positive tuning parameter in the penalized objective. Any non-negative value can be provided, but if missing, it is computed via $K$-fold cross validation by the function cv.glmnet() from package glmnet. The number of folds being used can be specified via the argument nfolds of the auxiliary function is.control(). The lambda argument in islasso.path.fit and islasso.path specifies the sequence of positive tuning parameters, user supplied or automatically computed based on nlambda and lambda.min.ratio.

A worked example: the Prostate data set

The Prostate dataset is a well-known benchmark in regression analysis and variable selection, originally published by Stamey et al. (1989). It contains clinical measurements from 97 male patients diagnosed with prostate cancer.

The primary goal is to model the level of prostate-specific antigen (PSA) based on several predictive variables, including:

lcavol: log cancer volume
lweight: log prostate weight
age: patient’s age
lbph: log benign prostatic hyperplasia amount
svi: seminal vesicle invasion
lcp: log capsular penetration
gleason: Gleason score (histological grade of tumor)
pgg45: percent of Gleason scores 4 or 5

The response variable is lpsa, the logarithm of PSA levels.

This dataset is particularly useful for testing due to its moderate size and the presence of multicollinearity among predictors.

To select the important terms in the regression equation we could simply apply the lasso using the R package glmnet

library(islasso)

data("Prostate", package = "islasso")
x <- model.matrix(lpsa ~ ., data = Prostate)[, -1]
y <- Prostate$lpsa
a1 <- cv.glmnet(x, y, family = "gaussian")
n <- nrow(Prostate)
a1$lambda.min * n

b <- drop(coef(a1, "lambda.min", exact = TRUE))
length(b[b != 0])

Ten-fold cross validation "selects" $\lambda=$ r round(a1$lambda.min * n, 3). corresponding to r length(b[b != 0]) non null coefficients

names(b[b != 0])

The last three estimates are

tail(b[b != 0], n = 3)

A reasonable question is if all the "selected" coefficients are significant in the model. Unfortunately lasso regression does not return standard errors due to nonsmoothness of objective, and some alternative approaches have been proposed., including the (Lockhart et al., 2013). Among the (few) strategies, including the 'covariance test', the 'post-selection inference' and the '(modified) residual bootstrap', here we illustrate the R package islasso implementing the recent `quasi' lasso approach based on the induced smoothing idea (Brown and Wang, 2005) as discussed in Cilluffo et al. (2019)

While the optimal lambda could be selected (without supplying any value to lambda), we use optimal value minimizing a specific criterion chosen between AIC, BIC, AICc, eBIC, GCV or GIC. From version 1.4.0 of the R package islasso optimal strategy is to built the regularization path

out <- islasso.path(lpsa ~ ., data = Prostate, nlambda = 50L, family = gaussian())
out

and then to choose the best tuning parameter through the one of the criteria listed above using the function GoF.islasso.path, e.g.,

lmb.best <- GoF.islasso.path(out)
lmb.best$lambda.min

Using also the regularization path is very usefull to have more insights about coefficients, standard errors and gradient profile

p1 <- plot(out, yvar = "coefficients")
p2 <- plot(out, yvar = "se")
p3 <- plot(out, yvar = "gradient")
gridExtra::grid.arrange(p1, p2, p3, ncol = 1L)

Once selected the best lambda value minimizing for example the AIC criterion, the last step of the strategy consists on fitting a new islasso model.

lambda.aic <- lmb.best$lambda.min["AIC"]
out2 <- islasso(lpsa ~ ., data = Prostate, lambda = lambda.aic, family = gaussian())
out2

The summary method quickly returns the main output of the fitted model, including point estimates, standard errors and $p$-values. Visualizing estimates for all covariates could be somewhat inconvenient, especially when the number of covariates is large, thus we decide to print estimates only if the pvalue is less than a threshold value. We use 0.10

summary(out2, pval = 0.10)

In addition to the usual information printed by the summary method, the output also includes the column Df representing the degrees of freedom of each coefficient. Their sum is used to quantify the model complexity

sum(out2$internal$hi)

and the corresponding residual degrees of freedom (r out$internal$n - out$rank) as reported above. The Wald test (column z value) and $p$-values can be used to assess important or significant covariates. Results suggest that variables lcavol, lweight and svi are informative to predict the measure of the logarithm of PSA levels, while lbph is borderline informative. Just to be clear, another way to obtain a similar result without computing the regularization path, is to use the function aic.islasso which requires a preliminary islasso fit object and a specification of the criterion to be used. Hence

lambda.aic2 <- aic.islasso(out2, method = "AIC", interval = c(.1, 50))
out3 <- update(out2, lambda = lambda.aic2)
summary(out3, pval = .10)

Comparisons between methods to select the tuning parameter and further discussions are out of the scope of this short note. We conclude this note by emphasizing that islasso also accepts the so-called elastic-net penalty, such that $$ \frac{1}{2}||\mathbf{y}- \mathbf{X\beta}||2^{2}+\lambda { \alpha ||\mathbf{\beta} ||^{}_1 + \frac{1}{2}(1-\alpha) ||\mathbf{\beta} ||^{2}_2 } $$ where $0\le \alpha\le 1$ is the mixing parameter to be specified in _islasso() and islasso.path() via the argument alpha, e.g.

# update the islasso path to fit an elastic-net model
out4 <- update(out, alpha = .5)
out4

# some diagnostic plot
p4 <- plot(out4, yvar = "coefficients")
p5 <- plot(out4, yvar = "se")
gridExtra::grid.arrange(p4, p5, ncol = 1L)

# select the best tuning parameter
lmb.best2 <- GoF.islasso.path(out4)
lmb.best2$lambda.min

# fit a new islasso model with elastic-net penalty
lambda.aic3 <- lmb.best2$lambda.min["AIC"]
out5 <- update(out2, alpha = .5, lambda = lambda.aic3)
summary(out5, pval = .10)

# or select the best tuning parameter using AIC with an islasso object
lambda.aic4 <- aic.islasso(out5, method = "AIC", interval = c(.1, 100))
out6 <- update(out5, lambda = lambda.aic4)
summary(out6, pval = .10)

References

Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc: Series B 1996; 58: 267–288
Cilluffo, G, Sottile, G, La Grutta, S and Muggeo, VMR (2019) The Induced Smoothed lasso: A practical framework for hypothesis testing in high dimensional regression. Statistical Methods in Medical Research, online doi: 10.1177/0962280219842890.
Brown B and Wang Y. Standard errors and covariance matrices for smoothed rank estimators. Biometrika 2005; 92: 149–158.