# dl.normalmeans: Normal Means Estimation and Hypothesis Testing with the... In NormalBetaPrime: Normal Beta Prime Prior

## Description

This function implements the Dirichlet-Laplace model of Bhattacharya et al. (2015) for obtaining a sparse estimate of θ = (θ_1, ..., θ_n) in the normal means problem,

X_i = θ_i + ε_i,

where ε_i \sim N(0, σ^2). This is achieved by placing the Dirichlet-Laplace (DL) prior on the individual θ_i's. The sparsity parameter τ in the Dir(τ, …, τ) prior can be specified a priori, or it can be estimated from the data, either by: 1) using the estimate of sparsity level by van der Pas et al. (2014), 2) by taking a restricted marginal maximum likelihood (REML) estimate on [1/n, 1], 3) endowing τ with a uniform prior, U(1/n, 1), or 4) endowing τ with a standard Cauchy prior truncated to [1/n, 1]. Multiple testing can also be performed by either thresholding the shrinkage factor in the posterior mean, or by examining the marginal 95 percent credible intervals.

## Usage

 1 2 3 4 dl.normalmeans(x, tau.est=c("fixed", "est.sparsity", "reml", "uniform", "truncatedCauchy"), tau=1/length(x), sigma2=1, var.select = c("threshold", "intervals"), max.steps=10000, burnin=5000) 

## Arguments

 x an n \times 1 multivariate normal vector. tau.est The method for estimating the sparsity parameter τ. If "fixed" is chosen, then τ is not estimated from the data. If "est.sparsity" is chosen, the empirical Bayes estimate of sparsity level from van der Pas et al. (2014) is used. If "reml" is chosen, τ is estimated from restricted marginal maximum likelihood on [1/n, 1]. If "uniform" is chosen, τ is estimated with a uniform prior on [1/n, 1]. If "truncatedCauchy" is chosen, τ is estimated with a standard Cauchy prior truncated to [1/n, 1]. tau The concentration parameter τ in the Dir(τ, …, τ) prior. Controls the sparsity of the model. Defaults to 1/n, but user may specify a different value for tau (τ > 0). This is ignored if the method "est.sparsity", "reml", "unif", or "truncatedCauchy" is used to estimate τ. sigma2 The variance parameter. Defaults to 1. User needs to specify the noise parameter if it is different from 1 (σ^2 > 0). var.select The method of variable selection. "threshold" selects variables by thresholding the shrinkage factor in the posterior mean. "intervals" will classify θ_i's as either signals (θ_i \neq 0) or as noise (θ_i = 0) by examining the 95 percent posterior credible intervals. max.steps The total number of iterations to run in the Gibbs sampler. Defaults to 10,000. burnin The number of burn-in iterations for the Gibbs sampler. Defaults to 5,000.

## Details

The function implements sparse estimation and multiple hypothesis testing on a multivariate normal mean vector, θ = (θ_1, …, θ_n) with the Dirichlet-Laplace prior of Bhattacharya et al. (2015). The full model is:

X | θ \sim N_n( θ, σ^2 I_n),

θ_i | (ψ_i, φ_i, ω) \sim N(0, σ^2 ψ_i φ_i^2 ω^2), i = 1, …, n,

ψ_i \sim Exp(1/2), i = 1, …, n,

(φ_1, …, φ_n) \sim Dir(τ, …, τ),

ω \sim G(n τ, 1/2).

τ is the main parameter that controls the sparsity of the solution. It can be estimated by: the empirical Bayes estimate of the estimated sparsity ("est.sparsity") given in van der Pas et al. (2014), restricted marginal maximum likelihood in the interval [1/n, 1] ("reml"), a uniform prior τ \sim U(1/n, 1) ("uniform"), or by a standard Cauchy prior truncated to [1/n, 1] ("truncatedCauchy").

The posterior mean is of the form [E(1-κ_i | X_1, …, X_n)] X_i, i = 1, …, n. The "threshold" method for variable selection is to classify θ_i as signal (θ_i \neq 0) if E(1 - κ_i | X_1, …, X_n) > 1/2.

## Value

The function returns a list containing the following components:

 theta.hat The posterior mean of θ. theta.med The posterior median of θ. theta.var The posterior variance estimates for each θ_i, i=1,…,p. theta.intervals The 95 percent credible intervals for all n components of θ. dl.classifications An n-dimensional binary vector with "1" if the covariate is selected and "0" if it is deemed irrelevant. tau.estimate The estimate of the sparsity level. If user specified "fixed" for tau.est, then it simply returns the fixed τ. If user specified "uniform" or "truncatedCauchy", it returns the posterior mean of π(τ |X_1, …, X_n).

## Author(s)

Ray Bai and Malay Ghosh

## References

Bhattacharya, A., Pati, D., Pillai, N. S., and Dunson, D. B. (2015). "Dirichlet-Laplace priors for optimal shrinkage." Journal of the American Statistical Association, 110(512):1479-1490.

van der Pas, S. L., Kleijn, B. J. K., and van der Vaart, A. W. (2014). "The horseshoe estimator: Posterior concentration around nearly black vectors." Electronic Journal of Statistics, 8(2):2585-2618.

van der Pas, S. L., Szabo, B. T., and van der Vaart, A. (2017). "Adaptive posterior contraction rates for the horseshoe." Electronic Journal of Statistics, 11(2):3196-3225.

## Examples

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 ################################################### # Example on synthetic data. # # 5 percent of entries in theta are set to A = 7. # ################################################### n <- 100 sparsity.level <- 5 A <- 7 # Initialize theta vector of all zeros theta.true <- rep(0,n) # Set (sparsity.level) percent of them to be A q <- floor(n*(sparsity.level/100)) # Pick random indices of theta.true to equal A signal.indices <- sample(1:n, size=q, replace=FALSE) ################### # Generate data X # ################### theta.true[signal.indices] <- A X <- theta.true + rnorm(n,0,1) ######################### # Run the DL model on X # ######################### # For optimal performance, should set max.steps=10,000 and burnin=5000. # Estimate tau from the empirical Bayes estimate of sparsity level dl.model <- dl.normalmeans(X, tau.est="est.sparsity", sigma2=1, var.select="threshold", max.steps=1000, burnin=500) dl.model$theta.med # Posterior median estimates dl.model$dl.classifications # Classifications dl.model\$tau.estimate # Estimate of sparsity level 

NormalBetaPrime documentation built on May 2, 2019, 1:39 p.m.