# Introduction to the EAlasso package In EAlasso: Simulation Based Inference of Lasso Estimator

EAlasso package is designed to facilitate simulation based inference of lasso type estimators. The package includes

• Computing the lasso type estimator
• Gaussian bootstrap and wild multiplier bootstrap sampler
• Importance sampler
• Metropolis-Hastings sampler for lasso estimator
• Post-selection inference for lasso estimator

## Loss function for lasso-type estimators

Let the unknown true model be $$y = X\beta_0 + \epsilon,$$ where $\beta_0$ is unknown true coefficient and $\epsilon_i \sim_{iid} N(0,\sigma^2)$.

We use following loss functions. $$\ell_{grlasso}(\beta ; \lambda) = \frac{1}{2}||y-X\beta||^2_2 + n\lambda\sum_j w_j||\beta_{(j)}||2,$$ $$\ell{sgrlasso}(\beta, \sigma ; \lambda) = \frac{1}{2\sigma}||y-X\beta||^2_2 + \frac{\sigma}{2} + n\lambda\sum_j w_j||\beta_{(j)}||_2,$$ where $grlasso$ and $sgrlasso$ represent group lasso and scaled group lasso, respectively, and $w_j$ is the weight factor for the j-th group. Loss functions for lasso and scaled lasso can be treated as special cases of group lasso and group scaled lasso when the group size is one, respectively.

## Fit Lasso Type Estimator

Lasso.MHLS computes lasso, group lasso, scaled lasso, or scaled group lasso solution. Users can either provide the value of $\lambda$ or choose to use cross-validation by setting lbd = "cv.min" or lbd = "cv.1se".

library(EAlasso)
set.seed(1234)
n <- 30; p <- 50
Niter <-  10
group <- rep(1:(p/5), each = 5)
weights <- rep(1, p/5)
beta0 <- c(rep(1,5), rep(0, p-5))
X <- matrix(rnorm(n*p), n)
Y <- X %*% beta0 + rnorm(n)
# set lambda = .5
Lasso.MHLS(X = X, Y = Y, lbd = .5, group = group, weights = weights, type="grlasso")
# use cross-validation
Lasso.MHLS(X = X, Y = Y, lbd = "cv.1se", group = group, weights = weights, type="grlasso")


## Parametric Bootstrap Sampler

PBsampler supports two bootstrap methods; Gaussian bootstrap and wild multiplier bootstrap. Due to the fact that the sampling distirbution of $(\hat{\beta}, S)$, coefficient estimator and its subgradient, is characterized by $(\beta_0, \sigma^2, \lambda)$, users are required to provide PE(either the estimate of $\beta_0$ or the estimate of $E(y) = X\beta_0$), sig2(or estimate of $\sigma^2$) and lbd($\lambda$ from above loss functions).

By specifying two sets of arguments, (PE_1, sig2_1, lbd_1) and (PE_2, sig2_2, lbd_2), users can sample from the mixture distribution. In this way, samples will be drawn from (PE_1, sig2_1, lbd_1) with 1/2 probability and from (PE_2, sig2_2, lbd_2) with another 1/2 probability.

set.seed(1234)
n <- 5; p <- 10
Niter <-  3
group <- rep(1:(p/5), each = 5)
weights <- rep(1, p/5)
beta0 <- c(rep(1,5), rep(0, p-5))
X <- matrix(rnorm(n*p), n)
Y <- X %*% beta0 + rnorm(n)
#
# Using non-mixture distribution
#
PBsampler(X = X, PE_1 = rep(0, p), sig2_1 = 1, lbd_1 = .5,
weights = weights, group = group, type = "grlasso", niter = Niter, parallel = FALSE)
#
# Using mixture distribution
#
PBsampler(X = X, PE_1 = rep(0, p), sig2_1 = 1, lbd_1 = .5,
PE_2 = rep(1, p), sig2_2 = 2, lbd_2 = .3, weights = weights,
group = group, type = "grlasso", niter = Niter, parallel = FALSE)


## Importance Sampler for High Dimensional Data

Importance Sampler enables users to access the inference of an extreme region. This is done by using a proposal distiribution that is denser around the extreme region.

Say that we are interested in computing the expectation of a function of ㅁ random variable, $h(X)$. Let $f(x)$ be the true or target distribution and $g(x)$ be the proposal distribution. We can approximate the expectation by a weighted average of samples drawn from the proposal distribution as follows,

$$\begin{eqnarray} E_f\Big[h(X)\Big] &=& E_g \Big[h(X)\frac{f(X)}{g(X)} \Big]\ &\approx& \frac{1}{N}\sum_{i=1}^N h(x_i)\frac{f(x_i)}{g(x_i)}. \end{eqnarray}$$ By using hdIS method, one can easily compute the importance weight which is the ratio of target density over proposal density; $f(x_i)/g(x_i)$ from above equation.

Users can simply draw samples from the proposal distribution using PBsampler and plug in the result into hdIS with target distribution parameters in order to compute the importance weights.

# Target distribution parameter
PETarget <- rep(0, p)
sig2Target <- .5
lbdTarget <- .37
# Proposal distribution parameter
PEProp1 <- rep(1, p)
sig2Prop1 <- .5
lbdProp1 <- 1

# Draw samples from proposal distribution
PB <- PBsampler(X = X, PE_1 = PEProp1, sig2_1 = sig2Prop1,
lbd_1 = lbdProp1, weights = weights, group = group, niter = Niter,
type="grlasso", PEtype = "coeff")

# Compute importance weights
hdIS(PB, PETarget = PETarget, sig2Target = sig2Target, lbdTarget = lbdTarget,
log = TRUE)


## Metropolis-Hastings Lasso Sampler

In this section, we introduce MHLS method, a Markov Chain Monte Carlo(MCMC) sampler for lasso estimator. Although bootstrapping is one of the most convenient sampling methods, it has a clear limitation which is that sampling from the conditional distribution is impossible. In contrast, MCMC sampler can easily draw samples from the conditional distribution. Here, we introduce MHLS function which draws lasso samples under the fixed active set, A.

weights <- rep(1,p)
lbd <- .37
LassoResult <- Lasso.MHLS(X = X,Y = Y,lbd = lbd, type = "lasso", weights = weights)
B0 <- LassoResult$B0 S0 <- LassoResult$S0
Result <- MHLS(X = X, PE = B0, sig2 = 1, lbd = 1,
weights = weights, B0 = B0, S0 = S0, niter = 100, burnin = 0,
PEtype = "coeff")
Result


We provide summary and plot functions for MHLS results.

summary(Result)

plot(Result, index=c(1,4,9))


## Post-selection Inference

Postinference.MHLS is a method for post-selection inference. The inference is provided by MHLS results from multiple chains. In order to make the method more robust, different PE values are used for each chain. After drawing samples of $(\hat{\beta}, S)$ from MH sampler, we then refit the estimator to remove the bias of the lasso estimator. The final output will be the $(1-a)$ quantile of each active coefficient.

Postinference.MHLS(X = X, Y = Y, lbd = lbd, sig2.hat = 1, alpha = .05,
nChain = 5, niterPerChain = 20, parallel = !(.Platform\$OS.type == "windows"))


## Try the EAlasso package in your browser

Any scripts or data that you put into this service are public.

EAlasso documentation built on Sept. 1, 2017, 9:03 a.m.