# nbp.normalmeans: Normal Means Estimation and Hypothesis Testing with the NBP... In NormalBetaPrime: Normal Beta Prime Prior

## Description

This function implements the NBP model of Bai and Ghosh (2018) for obtaining a sparse estimate of θ = (θ_1, ..., θ_n) in the normal means problem,

X_i = θ_i + ε_i,

where ε_i \sim N(0, σ^2). This is achieved by placing the normal-beta prime (NBP) prior on the individual θ_i's. The sparsity parameter a can be specified a priori, or it can be estimated from the data, either by: 1) using the estimate of sparsity level by van der Pas et al. (2014), 2) by taking a restricted marginal maximum likelihood (REML) estimate on [1/n, 1], 3) endowing τ with a uniform prior, U(1/n, 1), or 4) endowing τ with a standard Cauchy prior truncated to [1/n, 1]. Multiple testing can also be performed by either thresholding the shrinkage factor in the posterior mean, or by examining the marginal 95 percent credible intervals.

## Usage

 1 2 3 4 nbp.normalmeans(x, a.est=c("fixed", "est.sparsity", "reml", "uniform", "truncatedCauchy"), a=1/length(x), b=1/2+1/length(x), sigma2=1, var.select = c("threshold", "intervals"), max.steps=10000, burnin=5000) 

## Arguments

 x an n \times 1 multivariate normal vector. a.est The method for estimating the sparsity parameter a. If "fixed" is chosen, then a is not estimated from the data. If "est.sparsity" is chosen, the empirical Bayes estimate of sparsity level from van der Pas et al. (2014) is used. If "reml" is chosen, a is estimated from restricted marginal maximum likelihood on [1/n, 1]. If "uniform" is chosen, a is estimated with a uniform prior on [1/n, 1]. If "truncatedCauchy" is chosen, a is estimated with a standard Cauchy prior truncated to [1/n, 1]. a The shape parameter a in the β'(a,b) prior. Controls the sparsity of the model. Defaults to 1/n. User may specify a different value for a (a > 0). b The shape parameter b in the β'(a,b) prior. Defaults to 1/2+1/n. User may specify a different value (b > 0). sigma2 The variance parameter. Defaults to 1. User needs to specify the noise parameter if it is different from 1 (σ^2 > 0). var.select The method of variable selection. "threshold" selects variables by thresholding the shrinkage factor in the posterior mean. "intervals" will classify θ_i's as either signals (θ_i \neq 0) or as noise (θ_i = 0) by examining the 95 percent posterior credible intervals. max.steps The total number of iterations to run in the Gibbs sampler. Defaults to 10,000. burnin The number of burn-in iterations for the Gibbs sampler. Defaults to 5,000.

## Details

The function implements sparse estimation and multiple hypothesis testing on a multivariate normal mean vector, θ = (θ_1, …, θ_n) with the NBP prior of Bai and Ghosh (2019). The full model is:

X | θ \sim N_n( θ, σ^2 I_n),

θ_i | ω_i^2 \sim N(0, σ^2 ω_i^2), i = 1, …, n,

ω_i^2 \sim β'(a,b), i = 1, …, n,

where a is the main parameter that controls the sparsity of the solution. The sparsity parameter a can be estimated by: the empirical Bayes estimate of the estimated sparsity ("est.sparsity") given in van der Pas et al. (2014), restricted marginal maximum likelihood in the interval [1/n, 1] ("reml"), a uniform prior a \sim U(1/n, 1) ("uniform"), or by a standard Cauchy prior truncated to [1/n, 1] ("truncatedCauchy").

The posterior mean is of the form [E(1-κ_i | X_1, …, X_n)] X_i, i = 1, …, n. The "threshold" method for variable selection is to classify θ_i as signal (θ_i \neq 0) if E(1 - κ_i | X_1, …, X_n) > 1/2.

## Value

The function returns a list containing the following components:

 theta.hat The posterior mean of θ. theta.med The posterior median of θ. theta.var The posterior variance estimates for each θ_i, i=1,…,p. theta.intervals The 95 percent credible intervals for all n components of θ. nbp.classifications An n-dimensional binary vector with "1" if the covariate is selected and "0" if it is deemed irrelevant. a.estimate The estimate of the sparsity level. If user specified "fixed" for a.est, then it simply returns the fixed a. If user specified "uniform" or "truncatedCauchy", it returns the posterior mean of π(a |X_1, …, X_n).

## Author(s)

Ray Bai and Malay Ghosh

## References

Bai, R. and Ghosh, M. (2019). "Large-scale multiple hypothesis testing with the normal-beta prime prior." Pre-print, arXiv:1807.02421.

van der Pas, S. L., Kleijn, B. J. K., and van der Vaart, A. W. (2014). "The horseshoe estimator: Posterior concentration around nearly black vectors." Electronic Journal of Statistics, 8(2):2585-2618.

van der Pas, S. L., Szabo, B. T., and van der Vaart, A. (2017). "Adaptive posterior contraction rates for the horseshoe." Electronic Journal of Statistics, 11(2):3196-3225.

## Examples

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 ################################################### # Example on synthetic data. # # 5 percent of entries in theta are set to A = 7. # ################################################### n <- 40 sparsity.level <- 5 # 5 percent of entries will be nonzero A <- 5 # Initialize theta vector of all zeros theta.true <- rep(0,n) # Set (sparsity.level) percent of them to be A q <- floor(n*(sparsity.level/100)) signal.indices <- sample(1:n, size=q, replace=FALSE) ################### # Generate data X # ################### theta.true[signal.indices] <- A X <- theta.true + rnorm(n,0,1) ########################## # Run the NBP model on X # ########################## # For optimal performance, should set max.steps=10,000 and burnin=5000. # Estimate sparsity parameter 'a' with a uniform prior. nbp.model <- nbp.normalmeans(X, a.est="uniform", sigma2=1, var.select="threshold", max.steps=1000, burnin=500) nbp.model$theta.hat # Posterior mean estimates nbp.model$theta.intervals # Posterior credible intervals nbp.model$nbp.classifications # Classifications nbp.model$a.estimate # Estimate of sparsity level 

NormalBetaPrime documentation built on May 2, 2019, 1:39 p.m.