Introduction to the EAlasso package
In EAlasso: Simulation Based Inference of Lasso Estimator

EAlasso package is designed to facilitate simulation based inference of lasso type estimators. The package includes

Computing the lasso type estimator
Gaussian bootstrap and wild multiplier bootstrap sampler
Importance sampler
Metropolis-Hastings sampler for lasso estimator
Post-selection inference for lasso estimator

Loss function for lasso-type estimators

Let the unknown true model be $$ y = X\beta_0 + \epsilon,$$ where $\beta_0$ is unknown true coefficient and $\epsilon_i \sim_{iid} N(0,\sigma^2)$.

We use following loss functions. $$ \ell_{grlasso}(\beta ; \lambda) = \frac{1}{2}||y-X\beta||^2_2 + n\lambda\sum_j w_j||\beta_{(j)}||2,$$ $$ \ell{sgrlasso}(\beta, \sigma ; \lambda) = \frac{1}{2\sigma}||y-X\beta||^2_2 + \frac{\sigma}{2} + n\lambda\sum_j w_j||\beta_{(j)}||_2,$$ where $grlasso$ and $sgrlasso$ represent group lasso and scaled group lasso, respectively, and $w_j$ is the weight factor for the j-th group. Loss functions for lasso and scaled lasso can be treated as special cases of group lasso and group scaled lasso when the group size is one, respectively.

Fit Lasso Type Estimator

Lasso.MHLS computes lasso, group lasso, scaled lasso, or scaled group lasso solution. Users can either provide the value of $\lambda$ or choose to use cross-validation by setting lbd = "cv.min" or lbd = "cv.1se".

library(EAlasso)
set.seed(1234)
n <- 30; p <- 50
Niter <-  10
group <- rep(1:(p/5), each = 5)
weights <- rep(1, p/5)
beta0 <- c(rep(1,5), rep(0, p-5))
X <- matrix(rnorm(n*p), n)
Y <- X %*% beta0 + rnorm(n)
# set lambda = .5
Lasso.MHLS(X = X, Y = Y, lbd = .5, group = group, weights = weights, type="grlasso")
# use cross-validation
Lasso.MHLS(X = X, Y = Y, lbd = "cv.1se", group = group, weights = weights, type="grlasso")

Parametric Bootstrap Sampler

PBsampler supports two bootstrap methods; Gaussian bootstrap and wild multiplier bootstrap. Due to the fact that the sampling distirbution of $(\hat{\beta}, S)$, coefficient estimator and its subgradient, is characterized by $(\beta_0, \sigma^2, \lambda)$, users are required to provide PE(either the estimate of $\beta_0$ or the estimate of $E(y) = X\beta_0$), sig2(or estimate of $\sigma^2$) and lbd($\lambda$ from above loss functions).

By specifying two sets of arguments, (PE_1, sig2_1, lbd_1) and (PE_2, sig2_2, lbd_2), users can sample from the mixture distribution. In this way, samples will be drawn from (PE_1, sig2_1, lbd_1) with 1/2 probability and from (PE_2, sig2_2, lbd_2) with another 1/2 probability.

set.seed(1234)
n <- 5; p <- 10
Niter <-  3
group <- rep(1:(p/5), each = 5)
weights <- rep(1, p/5)
beta0 <- c(rep(1,5), rep(0, p-5))
X <- matrix(rnorm(n*p), n)
Y <- X %*% beta0 + rnorm(n)
#
# Using non-mixture distribution
#
PBsampler(X = X, PE_1 = rep(0, p), sig2_1 = 1, lbd_1 = .5,
weights = weights, group = group, type = "grlasso", niter = Niter, parallel = FALSE)
#
# Using mixture distribution
#
PBsampler(X = X, PE_1 = rep(0, p), sig2_1 = 1, lbd_1 = .5,
 PE_2 = rep(1, p), sig2_2 = 2, lbd_2 = .3, weights = weights,
 group = group, type = "grlasso", niter = Niter, parallel = FALSE)

Importance Sampler for High Dimensional Data

Importance Sampler enables users to access the inference of an extreme region. This is done by using a proposal distiribution that is denser around the extreme region.

Say that we are interested in computing the expectation of a function of ㅁ random variable, $h(X)$. Let $f(x)$ be the true or target distribution and $g(x)$ be the proposal distribution. We can approximate the expectation by a weighted average of samples drawn from the proposal distribution as follows,

$$ \begin{eqnarray} E_f\Big[h(X)\Big] &=& E_g \Big[h(X)\frac{f(X)}{g(X)} \Big]\ &\approx& \frac{1}{N}\sum_{i=1}^N h(x_i)\frac{f(x_i)}{g(x_i)}. \end{eqnarray} $$ By using hdIS method, one can easily compute the importance weight which is the ratio of target density over proposal density; $f(x_i)/g(x_i)$ from above equation.

Users can simply draw samples from the proposal distribution using PBsampler and plug in the result into hdIS with target distribution parameters in order to compute the importance weights.

# Target distribution parameter
PETarget <- rep(0, p)
sig2Target <- .5
lbdTarget <- .37
# Proposal distribution parameter
PEProp1 <- rep(1, p)
sig2Prop1 <- .5
lbdProp1 <- 1

# Draw samples from proposal distribution
PB <- PBsampler(X = X, PE_1 = PEProp1, sig2_1 = sig2Prop1,
 lbd_1 = lbdProp1, weights = weights, group = group, niter = Niter,
 type="grlasso", PEtype = "coeff")

# Compute importance weights
hdIS(PB, PETarget = PETarget, sig2Target = sig2Target, lbdTarget = lbdTarget,
log = TRUE)

Metropolis-Hastings Lasso Sampler

In this section, we introduce MHLS method, a Markov Chain Monte Carlo(MCMC) sampler for lasso estimator. Although bootstrapping is one of the most convenient sampling methods, it has a clear limitation which is that sampling from the conditional distribution is impossible. In contrast, MCMC sampler can easily draw samples from the conditional distribution. Here, we introduce MHLS function which draws lasso samples under the fixed active set, A.

weights <- rep(1,p)
lbd <- .37
LassoResult <- Lasso.MHLS(X = X,Y = Y,lbd = lbd, type = "lasso", weights = weights)
B0 <- LassoResult$B0
S0 <- LassoResult$S0
Result <- MHLS(X = X, PE = B0, sig2 = 1, lbd = 1,
    weights = weights, B0 = B0, S0 = S0, niter = 100, burnin = 0,
    PEtype = "coeff")
Result

We provide summary and plot functions for MHLS results.

summary(Result)

plot(Result, index=c(1,4,9))

Post-selection Inference

Postinference.MHLS is a method for post-selection inference. The inference is provided by MHLS results from multiple chains. In order to make the method more robust, different PE values are used for each chain. After drawing samples of $(\hat{\beta}, S)$ from MH sampler, we then refit the estimator to remove the bias of the lasso estimator. The final output will be the $(1-a)$ quantile of each active coefficient.

Postinference.MHLS(X = X, Y = Y, lbd = lbd, sig2.hat = 1, alpha = .05,
 nChain = 5, niterPerChain = 20, parallel = !(.Platform$OS.type == "windows"))

Any scripts or data that you put into this service are public.

EAlasso documentation built on Sept. 1, 2017, 9:03 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

EAlasso
Simulation Based Inference of Lasso Estimator

Introduction to the EAlasso package
In EAlasso: Simulation Based Inference of Lasso Estimator

Loss function for lasso-type estimators

Fit Lasso Type Estimator

Parametric Bootstrap Sampler

Importance Sampler for High Dimensional Data

Metropolis-Hastings Lasso Sampler

Post-selection Inference

Try the EAlasso package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

EAlasso Simulation Based Inference of Lasso Estimator

Introduction to the EAlasso package In EAlasso: Simulation Based Inference of Lasso Estimator

Loss function for lasso-type estimators

Fit Lasso Type Estimator

Parametric Bootstrap Sampler

Importance Sampler for High Dimensional Data

Metropolis-Hastings Lasso Sampler

Post-selection Inference

Try the EAlasso package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

EAlasso
Simulation Based Inference of Lasso Estimator

Introduction to the EAlasso package
In EAlasso: Simulation Based Inference of Lasso Estimator