`EAinference`

package is designed to facilitate simulation based inference of lasso type estimators.
The package includes

- Fitting the lasso, group lasso, scaleld lasso and scaled group lasso models
- Gaussian and wild multiplier bootstrap for lasso, group lasso, scaled lasso, scaled group lasso and their de-biased estimators
- Importance sampler for approximating p-values in these methods
- Markov chain Monte Carlo lasso sampler with applications in post-selection inference.

Let the unknown true model be $$ y = X\beta_0 + \epsilon,$$ where $\beta_0$ is a unknown true coefficient vector and $\epsilon_i \sim_{iid} N(0,\sigma^2)$.

We use following loss functions.
$$ \ell_{grlasso}(\beta ; \lambda) = \frac{1}{2}||y-X\beta||^2_2 + n\lambda\sum_j w_j||\beta_{(j)}||*2,$$
$$ \ell*{sgrlasso}(\beta, \sigma ; \lambda) = \frac{1}{2\sigma}||y-X\beta||^2_2 + \frac{\sigma}{2} + n\lambda\sum_j w_j||\beta_{(j)}||_2,$$
where $grlasso$ and $sgrlasso$ represent group lasso and scaled group lasso, respectively,
and $w_j$ is the weight factor for the $j$-th group. Loss functions for lasso and
scaled lasso can be treated as special cases of group lasso and group scaled lasso
when every group is a singleton, respectively.

`lassoFit`

computes lasso, group lasso, scaled lasso and scaled group lasso solutions.
Users can either provide the value of $\lambda$ or choose to use cross-validation
by setting `lbd = "cv.min"`

or `lbd = "cv.1se"`

. By letting `lbd = "cv.min"`

, users
can have $\lambda$ which gives minimum mean-squared error, while `lbd = "cv.1se"`

is the largest $\lambda$ within 1 standard deviaion error bound of `"cv.min"`

.

library(EAinference) set.seed(1234) n <- 30; p <- 50 Niter <- 10 group <- rep(1:(p/5), each = 5) weights <- rep(1, p/5) beta0 <- c(rep(1,5), rep(0, p-5)) X <- matrix(rnorm(n*p), n) Y <- X %*% beta0 + rnorm(n) # set lambda = .5 lassoFit(X = X, Y = Y, lbd = .5, group = group, weights = weights, type="grlasso") # use cross-validation lassoFit(X = X, Y = Y, lbd = "cv.1se", group = group, weights = weights, type="grlasso")

`PBsampler`

supports two bootstrap methods; Gaussian bootstrap and wild multiplier bootstrap.
Due to the fact that the sampling distirbution of $(\hat{\beta}, S)$, coefficient estimator and its subgradient, is characterized by $(\beta_0, \sigma^2, \lambda)$, users are required to provide `PE`

(either the estimate of $\beta_0$ or the estimate of $E(y) = X\beta_0$), `sig2`

(or estimate of $\sigma^2$)
and `lbd`

($\lambda$ from above loss functions).

By specifying two sets of arguments,
`(PE_1, sig2_1, lbd_1)`

and `(PE_2, sig2_2, lbd_2)`

, users can sample from the mixture distribution.
In this way, samples will be drawn from `(PE_1, sig2_1, lbd_1)`

with 1/2 probability and from
`(PE_2, sig2_2, lbd_2)`

with another 1/2 probability.

set.seed(1234) n <- 5; p <- 10 Niter <- 3 group <- rep(1:(p/5), each = 5) weights <- rep(1, p/5) beta0 <- c(rep(1,5), rep(0, p-5)) X <- matrix(rnorm(n*p), n) Y <- X %*% beta0 + rnorm(n) # # Using non-mixture distribution # PBsampler(X = X, PE_1 = rep(0, p), sig2_1 = 1, lbd_1 = .5, weights = weights, group = group, type = "grlasso", niter = Niter, parallel = FALSE) # # Using mixture distribution # PBsampler(X = X, PE_1 = rep(0, p), sig2_1 = 1, lbd_1 = .5, PE_2 = rep(1, p), sig2_2 = 2, lbd_2 = .3, weights = weights, group = group, type = "grlasso", niter = Niter, parallel = FALSE)

Importance sampler enables users to access the inference of an extreme region. This is done by using a proposal distiribution that is denser around the extreme region.

Say that we are interested in computing the expectation of a function of a random variable, $h(X)$. Let $f(x)$ be the true or target distribution and $g(x)$ be the proposal distribution. We can approximate the expectation by a weighted average of samples drawn from the proposal distribution as follows,

$$
\begin{eqnarray}
E_f\Big[h(X)\Big] &=& E_g \Big[h(X)\frac{f(X)}{g(X)} \Big]\
&\approx& \frac{1}{N}\sum_{i=1}^N h(x_i)\frac{f(x_i)}{g(x_i)}.
\end{eqnarray}
$$
By using `hdIS`

method, one can easily compute the importance weight which is the ratio of
target density over proposal density; $f(x_i)/g(x_i)$ from above equation.

Users can simply draw samples from the proposal distribution using `PBsampler`

and
plug in the result into `hdIS`

with target distribution parameters in order to
compute the importance weights.

# Target distribution parameter PETarget <- rep(0, p) sig2Target <- .5 lbdTarget <- .37 # Proposal distribution parameter PEProp1 <- rep(1, p) sig2Prop1 <- .5 lbdProp1 <- 1 # Draw samples from proposal distribution PB <- PBsampler(X = X, PE_1 = PEProp1, sig2_1 = sig2Prop1, lbd_1 = lbdProp1, weights = weights, group = group, niter = Niter, type="grlasso", PEtype = "coeff") # Compute importance weights hdIS(PB, PETarget = PETarget, sig2Target = sig2Target, lbdTarget = lbdTarget, log = TRUE)

In this section, we introduce `MHLS`

method, a Markov Chain Monte Carlo(MCMC) sampler for lasso estimator.
Although bootstrapping is one of the most convenient sampling methods, it has a clear
limitation which is that sampling from the conditional distribution is impossible.
In contrast, MCMC sampler can easily draw samples from the
conditional distribution. Here, we introduce `MHLS`

function which draws lasso samples
under the fixed active set, A.

weights <- rep(1,p) lbd <- .37 LassoResult <- lassoFit(X = X,Y = Y,lbd = lbd, type = "lasso", weights = weights) B0 <- LassoResult$B0 S0 <- LassoResult$S0 Result <- MHLS(X = X, PE = B0, sig2 = 1, lbd = 1, weights = weights, B0 = B0, S0 = S0, niter = 100, burnin = 0, PEtype = "coeff") Result

We provide summary and plot functions for `MHLS`

results.

```
summary(Result)
```

plot(Result, index=c(1,4,9))

`postInference.MHLS`

is a method for post-selection inference.
The inference is provided by `MHLS`

results from multiple chains.
In order to achieve the robust result, different `PE`

values are used
for each chain. After drawing samples of $(\hat{\beta}, S)$ with MH sampler,
we then refit the estimator to remove the bias of the lasso estimator. The final
output is the $(1-a)$ quantile of each active coefficient.

postInference.MHLS(X = X, Y = Y, lbd = lbd, sig2.hat = 1, alpha = .05, method = "coeff", nChain = 5, niterPerChain = 20, parallel = !(.Platform$OS.type == "windows"))

**Any scripts or data that you put into this service are public.**

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.