set.seed(12345)

This vignette illustrates the basic and advanced usage of the SLOPE package for R. To learn more about the theory behind SLOPE, see the SLOPE paper [@slope2014].

For simplicity, we will use synthetic data constructed such that the true coefficient vector $\beta$ has very few nonzero entries. The details of the construction are unimportant, but are provided below for completeness.

# Problem parameters n = 500 # number of observations p = 500 # number of variables k = 10 # number of variables with nonzero coefficients amplitude = 1.2*sqrt(2*log(p)) # signal amplitude (for noise level = 1) # Design matrix X <- local({ probs = runif(p, 0.1, 0.5) probs = t(probs) %x% matrix(1,n,2) X0 = matrix(rbinom(2*n*p, 1, probs), n, 2*p) X0 %*% (diag(p) %x% matrix(1,2,1)) }) # Coefficient and response vectors nonzero = sample(p, k) beta = amplitude * (1:p %in% nonzero) y = X %*% beta + rnorm(n)

In our example, the design matrix has entries in ${0,1,2}$.

knitr::kable(X[1:10,1:10], col.names=1:10)

The main entry point for the SLOPE package is the function `SLOPE`

. To begin, we call `SLOPE`

with all of the default options.

library(SLOPE) result <- SLOPE(X, y) print(result)

The output shows the selected variables and their coefficients. The achieved false discovery proportion (FDP) is close to 0.2, the default false discovery rate (FDR).

fdp <- function(selected) sum(beta[selected] == 0) / max(1, length(selected)) fdp(result$selected)

The target FDR can be adjusted using the parameter `fdr`

:

result <- SLOPE(X, y, fdr=0.1) fdp(result$selected)

By default, the noise level is estimated from the data. If $n$ is large enough compared to $p$, the classical unbiased estimate of $\sigma^2$ is used. Otherwise, the *iterative SLOPE* algorithm is used, as described in Section 3.2.3 of [@slope2014]. If the noise level is known, it can be specified directly, bypassing the iterative estimation procedure. For example:

result <- SLOPE(X, y, sigma=1) fdp(result$selected)

The primary hyper-parameter in the SLOPE procedure is the decreasing sequence of $\lambda$ values used in the sorted L1 minimization problem:
$$
\min_\beta \left{
\frac{1}{2} \Vert X\beta - y \Vert_2^2 +
\sigma \cdot \sum_{i=1}^p \lambda_i |\beta|_{(i)}
\right}
$$
The $\lambda$ sequence can be set using the `lambda`

parameter. By default, we use a sequence motivated by Gaussian designs, described in Section 3.2.2 of [@slope2014]. In most cases, this sequence should be preferred.

If you have an orthogonal design, the original Benjamini-Hochberg (BHq) sequence can also be used (see Section 1.1). We illustrate this below.

X.orth = sqrt(n) * qr.Q(qr(X)) y.orth = X.orth %*% beta + rnorm(n) result = SLOPE(X.orth, y.orth, lambda='bhq') fdp(result$selected)

Notice that if we use the BHq sequence on a non-orthogonal design, FDR control is lost:

result = SLOPE(X, y, lambda='bhq') fdp(result$selected)

Therefore, the BHq sequence should only be used with orthogonal designs.

For completeness, we note that it is also possible to specify a custom sequence of $\lambda$ values. (You probably don't want to do this.)

**Any scripts or data that you put into this service are public.**

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.