qut.MC: Monte Carlo simulation for estimating the Quantile Universal...
In lass0: Lasso-Zero for (High-Dimensional) Linear Regression

Description Usage Arguments Details Value References See Also Examples

View source: R/qut.MC.R

Performs a Monte Carlo simulation to estimate the distribution of the null thresholding statistic required for computation of the quantile universal threshold, and computes its upper alpha-quantile if alpha is provided.

qut.MC(X, q = nrow(X), M = 30, alpha = NULL, sigma = NULL,
  intercept = TRUE, standardizeX = TRUE, standardizeG = NULL,
  MCrep = 100, GEVapprox = TRUE, parallel = FALSE,
  var.subset = 1:ncol(X))

`X`	input matrix of dimension `n x p`, previously mean-centered and standardized if necessary!
`q`	size of noise dictionaries. A noise dictionary consists in a Gaussian matrix G of size `n x q` concatenated horizontally to the input matrix X. Default is `q = nrow(X)`.
`M`	number of noise dictionaries used.
`alpha`	level of the quantile universal threshold. By default `alpha = NULL` and no quantile is returned.
`sigma`	standard deviation of the noise. If `sigma = NULL` (default), the statistic of interest if pivotized.
`intercept`	if `TRUE` (default), the columns of `X` are mean-centered before the analysis.
`standardizeX`	whether the columns of `X` should be standardized to have unit standard deviation. Default is `TRUE`.
`standardizeG`	either a positive numerical value indicating the desired Euclidean norm of all columns of the noise dictionaries, or a logical value indicating whether the columns of the noise dictionaries should be standardized to have unit standard deviation. If `NULL` (default), then it is set to `standardizeG = standardizeX`.
`MCrep`	number of Monte Carlo replications. Default is `MCrep = 100.`
`GEVapprox`	whether to approximate the distribution of the null thresholding statistic by a GEV distribution. If `TRUE`, the maximum likelihood estimates of the GEV parameters are computed on the Monte Carlo sample. Default if `TRUE`.
`parallel`	if `TRUE`, use parallel `foreach` to perform the Monte Carlo simulation. Must register parallel beforehand, e.g. with `doParallel`. Default is `FALSE`.
`var.subset`	subset of variables for which QUT is computed (i.e. we will compute the parameter value for which P(betahat[var.subset]) = 0) = 1-alpha when beta = 0.

If the noise level sigma is known, the statistic of interest is simply the sup-norm of the Lasso-Zero coefficients obtained under the null hypothesis (i.e. when all coefficients all zero) when the threshold tau is set to 0, and its upper alpha-quantile is the quantile universal threshold. If sigma = NULL (sigma unknown) a pivotized statistic is used, which is obtained by dividing the statistic described above by the MAD of all nonzero noise coefficients obtained by Lasso-Zero.

An object of class "qut.MC", which is a list with the following components:

`allMC`	all `MCrep` realizations of the null thresholding statistic of interest (pivotized if `sigma = NULL`).
`GEVpar`	MLE estimates of the GEV distribution parameters (`NULL` if `GEVapprox` was set to `FALSE`).
`GEVfit`	set to NULL is GEVapprox is FALSE. If GEVapprox is TRUE, GEVfit is either the result of the gev.fit function, or the character string "error" if gev.fit produced an error.
`upperQuant`	upper alpha-quantile of the null thresholding statistis (either the empirical quantile, or the quantile of the fitted GEV distribution).
`call`	matched call.
`lass0settings`	a list containing the chosen settings for the computation of lasso-zero: `q`, `M`, `sigma`, `intercept`, `standardizeX` and `standardizeG`. When an object of type `"qut.MC"` is supplied to the `lass0` function, a warning message appears if the corresponding arguments passed to `lass0` are different.

Descloux, P., & Sardy, S. (2018). Model selection with lasso-zero: adding straw to the haystack to better find needles. arXiv preprint arXiv:1805.05133. https://arxiv.org/abs/1805.05133

Giacobino, C., Sardy, S., Diaz-Rodriguez, J., & Hengartner, N. (2017). Quantile universal threshold. Electronic Journal of Statistics, 11(2), 4701-4722.

lass0

### Fast toy example with 5x10 input matrix and a small number (MCrep = 50)
### of Monte Carlo replications.
### Illustrates how to tune Lasso-Zero with QUT for the same input matrix but
### different responses and/or different alpha values, without calling
### qut.MC several times:

### (for faster computation when X and MCrep are larger: register a parallel 
### backend and choose parallel = TRUE when calling lass0 and qut.MC functions.)

set.seed(3)

## input matrix:
n <- 5
p <- 10
X <- matrix(rnorm(n*p), n, p)

## two sparse vectors and corresponding responses:
S1 <- 1:2 # first support
beta1 <- numeric(p)
beta1[S1] <- 5
y1 <- X[, S1] %*% beta1[S1] + rnorm(n)
S2 <- 3:4 # second support
beta2 <- numeric(p)
beta2[S2] <- 5
y2 <- X[, S2] %*% beta2[S2] + rnorm(n)


## Monte Carlo simulation giving empirical distribution for the statistic P (see paper below):
qut.MC.output <-  qut.MC(X, parallel = FALSE, MCrep = 50)

## lasso-zero estimates:

## for y1 with alpha = 0.1:
lass01 <- lass0(X, y1, alpha = 0.1, qut.MC.output = qut.MC.output, parallel = FALSE)
plot(lass01)

## for y2 with alpha = 0.05:
lass02 <- lass0(X, y2, alpha = 0.05, qut.MC.output = qut.MC.output, parallel = FALSE)
plot(lass02)