Description Usage Arguments Details Value Author(s) References Examples
This function provides a fully Bayesian approach for obtaining a sparse estimate of θ = (θ_1, ..., θ_n) in the normal means problem,
X_i = θ_i + ε_i,
where ε_i \sim N(0, σ^2). This is achieved by placing the inverse gamma-gamma (IGG) prior on the individual θ_i's. Variable selection can also be performed by using either thresholding the shrinkage factor in the posterior mean, or by examining the marginal 95 percent credible intervals.
1 2 | igg.normalmeans(x, a=NA, b=NA, sigma2=NA, var.select = c("threshold", "intervals"),
max.steps=10000, burnin=5000)
|
x |
an n \times 1 multivariate normal vector. |
a |
The parameter for IG(a,1). If not specified ( |
b |
The parameter for G(b,1). If not specified ( |
sigma2 |
The variance parameter. If the user does not specify this ( |
var.select |
The method of variable selection. |
max.steps |
The total number of iterations to run in the Gibbs sampler. Defaults to 10,000. |
burnin |
The number of burn-in iterations for the Gibbs sampler. Defaults to 5,000. |
The function performs sparse estimation of θ = (θ_1, ..., θ_n) in normal means problem. The full model is:
X | θ \sim N_n( θ, σ^2 I_n),
θ_i | (λ_i,ξ_i, σ^2) \sim N(0, σ^2 λ_i ξ_i), i = 1, ..., n,
λ_i \sim IG(a,1), i = 1, ..., n,
ξ_i \sim G(b,1), i = 1, ..., n,
σ^2 \propto 1/σ^2.
If σ^2 is known or estimated separately, the Gibbs sampler does not sample from the full conditional for σ^2.
As described in Bai and Ghosh (2018), the posterior mean is of the form [E(1-κ_i | X_i)] X_i, i = 1, ..., n. The "threshold" method for variable selection is to classify θ_i as signal (θ_i \neq 0) if
E(1 - κ_i | X_i) > 1/2,
and to classify θ_i as noise (θ_i = 0) if
E(1 - κ_i |X_i) ≤q 1/2.
The function returns a list containing the following components:
theta.hat |
The posterior mean estimate of θ. |
theta.med |
The posterior median estimate of θ. |
theta.intervals |
The lower and upper endpoints of the 95 percent credible intervals for all n components of θ. |
igg.classifications |
An n-dimensional binary vector with "1" if the covariate is selected and "0" if it is deemed irrelevant. |
Ray Bai and Malay Ghosh
Bai, R. and Ghosh, M. (2018). "The Inverse Gamma-Gamma Prior for Optimal Posterior Contraction and Multiple Hypothesis Testing." Submitted, arXiv:1711.07635.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 | ###################################################
###################################################
## Example on synthetic data. ##
## 5 percent of entries in a sparse vector theta ##
## are set equal to signal value A =7. ##
###################################################
###################################################
n <- 100
sparsity.level <- 5
A <- 7
# Initialize theta vector of all zeros
theta.true <- rep(0,n)
# Set (sparsity.level)% of them to be A
q <- floor(n*(sparsity.level/100))
# Pick random indices of theta.true to equal A
signal.indices <- sample(1:n, size=q, replace=FALSE)
#######################
# Generate true theta #
#######################
theta.true[signal.indices] <- A
#######################################################
# Generate data X by corrupting theta.true with noise #
#######################################################
X <- theta.true + rnorm(n,0,1)
##########################
# Run the IGG model on X #
##########################
# For optimal performance, should set max.steps=10,000 and burnin=5000.
igg.model <- igg.normalmeans(X, var.select="threshold", max.steps=2000, burnin=1000)
################################
# Calculate mean squared error #
################################
igg.mse <- sum((igg.model$theta.med-theta.true)^2)/n
igg.mse
# To compute misclassification probability and false discovery rate
true.classifications <- rep(0,n)
signal.indicies <- which(theta.true != 0)
true.classifications[signal.indices] <- 1
igg.classifications <- igg.model$igg.classifications
false.pos <- length(which(igg.classifications != 0 & true.classifications == 0))
num.pos <- length(which(igg.classifications != 0))
false.neg <- length(which(igg.classifications == 0 & true.classifications != 0))
########################################################
# Calculate FDR and misclassification probability (MP) #
########################################################
igg.fdr <- false.pos/num.pos
igg.fdr
igg.mp <- (false.pos+false.neg)/n
igg.mp
## Not run:
#
#
######################################################
######################################################
## Prostate cancer data analysis. ##
## Running this code will allow you to reproduce ##
## the results in Section 6 of Bai and Ghosh (2018) ##
######################################################
######################################################
# Load the data
data(singh2002)
attach(singh2002)
# Only look at the gene expression data.
# First 50 rows are the cancer patients,
# and the last 52 rows are the control subjects.d
prostate.data <- singh2002$x
# Perform 2-sample t-test and obtain z-scores
n <- ncol(prostate.data)
test.stat <- rep(NA,n)
z.scores <- rep(NA, n)
# Fill in the vectors
for(i in 1:n){
test.stat[i] <- t.test(prostate.data[51:102,i],
prostate.data[1:50,i])$statistic
z.scores[i] <- qnorm(pt(test.stat[i],100))
}
#######################################
# Apply IGG model on the z-scores. #
# Here sigma2 is known with sigma2= 1 #
#######################################
igg.model <- igg.normalmeans(z.scores, sigma2=1, var.select="threshold")
##########################################
# How many genes flagged as significant? #
##########################################
num.sig <- sum(igg.model$igg.classifications != 0)
num.sig
#######################################################
# Estimated effect size for 10 most significant genes #
#######################################################
most.sig <- c(610,1720,332,364,914,3940,4546,1068,579,4331)
igg.model$theta.hat[most.sig]
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.