Using the SLOPE package

set.seed(12345)

This vignette illustrates the basic and advanced usage of the SLOPE package for R. To learn more about the theory behind SLOPE, see the SLOPE paper [@slope2014].

Synthetic data

For simplicity, we will use synthetic data constructed such that the true coefficient vector $\beta$ has very few nonzero entries. The details of the construction are unimportant, but are provided below for completeness.

# Problem parameters
n = 500                         # number of observations
p = 500                         # number of variables
k = 10                          # number of variables with nonzero coefficients
amplitude = 1.2*sqrt(2*log(p))  # signal amplitude (for noise level = 1)

# Design matrix
X <- local({
  probs = runif(p, 0.1, 0.5)
  probs = t(probs) %x% matrix(1,n,2)
  X0 = matrix(rbinom(2*n*p, 1, probs), n, 2*p)
  X0 %*% (diag(p) %x% matrix(1,2,1))
})

# Coefficient and response vectors
nonzero = sample(p, k)
beta = amplitude * (1:p %in% nonzero)
y = X %*% beta + rnorm(n)

In our example, the design matrix has entries in ${0,1,2}$.

knitr::kable(X[1:10,1:10], col.names=1:10)

Basic usage

The main entry point for the SLOPE package is the function SLOPE. To begin, we call SLOPE with all of the default options.

library(SLOPE)

result <- SLOPE(X, y)
print(result)

The output shows the selected variables and their coefficients. The achieved false discovery proportion (FDP) is close to 0.2, the default false discovery rate (FDR).

fdp <- function(selected) sum(beta[selected] == 0) / max(1, length(selected))
fdp(result$selected)

The target FDR can be adjusted using the parameter fdr:

result <- SLOPE(X, y, fdr=0.1)
fdp(result$selected)

Advanced usage

Noise level

By default, the noise level is estimated from the data. If $n$ is large enough compared to $p$, the classical unbiased estimate of $\sigma^2$ is used. Otherwise, the iterative SLOPE algorithm is used, as described in Section 3.2.3 of [@slope2014]. If the noise level is known, it can be specified directly, bypassing the iterative estimation procedure. For example:

result <- SLOPE(X, y, sigma=1)
fdp(result$selected)

Lambda sequence

The primary hyper-parameter in the SLOPE procedure is the decreasing sequence of $\lambda$ values used in the sorted L1 minimization problem: $$ \min_\beta \left{ \frac{1}{2} \Vert X\beta - y \Vert_2^2 + \sigma \cdot \sum_{i=1}^p \lambda_i |\beta|_{(i)} \right} $$ The $\lambda$ sequence can be set using the lambda parameter. By default, we use a sequence motivated by Gaussian designs, described in Section 3.2.2 of [@slope2014]. In most cases, this sequence should be preferred.

If you have an orthogonal design, the original Benjamini-Hochberg (BHq) sequence can also be used (see Section 1.1). We illustrate this below.

X.orth = sqrt(n) * qr.Q(qr(X))
y.orth = X.orth %*% beta + rnorm(n)
result = SLOPE(X.orth, y.orth, lambda='bhq')
fdp(result$selected)

Notice that if we use the BHq sequence on a non-orthogonal design, FDR control is lost:

result = SLOPE(X, y, lambda='bhq')
fdp(result$selected)

Therefore, the BHq sequence should only be used with orthogonal designs.

For completeness, we note that it is also possible to specify a custom sequence of $\lambda$ values. (You probably don't want to do this.)

References



Try the SLOPE package in your browser

Any scripts or data that you put into this service are public.

SLOPE documentation built on May 30, 2017, 12:34 a.m.