mbsp.tpbn: MBSP Model with Three Parameter Beta Normal (TPBN) Family

Description Usage Arguments Details Value Author(s) References Examples

View source: R/mbsp.tpbn.R

Description

This function provides a fully Bayesian approach for obtaining a sparse estimate of the p \times q coefficients matrix B in the multivariate linear regression model,

Y = XB+E,

using the three parameter beta normal (TPBN) family.

Usage

1
mbsp.tpbn(X, Y, u=0.5, a=0.5, tau=NA, max.steps=15000, burnin=5000)

Arguments

X

design matrix of n samples and p covariates.

Y

response matrix of n samples and q response variables.

u

The first parameter in the TPBN family. Defaults to u=0.5 for the horseshoe prior.

a

The second parameter in the TPBN family. Defaults to a=0.5 for the horseshoe prior.

tau

The global parameter. If the user does not specify this (tau=NA), the Gibbs sampler will use τ = 1/(p*n*log(n)). The user may also specify a value for τ between 0 and 1, otherwise it defaults to τ = 1/(p*n*log(n)).

max.steps

The total number of iterations to run in the Gibbs sampler. Defaults to 15,000.

burnin

The number of burn-in iterations for the Gibbs sampler. Defaults to 5,000.

Details

The function performs sparse estimation of B and variable selection from the p covariates. The lower and upper endpoints of the 95 percent posterior credible intervals for each of the pq elements of B are also returned so that the user may assess uncertainty quantification.

In the three parameter beta normal (TPBN) family, (u,a)=(0.5,0.5) corresponds to the horseshoe prior, (u,a)=(1,0.5) corresponds to the Strawderman-Berger prior, and (u,a)=(1,a), a > 0 corresponds to the normal-exponential-gamma (NEG) prior. This function uses the horseshoe prior as the default shinkrage prior.

Value

The function returns a list containing the following components:

B.est

the point estimate of the p \times q matrix B (taken as the componentwise posterior median for all pq entries).

lower.endpoints

The 2.5th percentile of the posterior density (or the lower endpoint of the 95 percent credible interval) for all pq entries of B.

upper.endpoints

The 97.5th percentile of the posterior density (or the upper endpoint of the 95 percent credible interval) for all pq entries of B.

active.predictors

The active (nonzero) covariates chosen by our model from the p total predictors.

Author(s)

Ray Bai and Malay Ghosh

References

Armagan, A., Clyde, M., and Dunson, D.B. (2011) Generalized Beta Mixtures of Gaussians. In J. Shawe-taylor, R. Zemel, P. Bartlett, F. Pereira, and K. Weinberger (Eds.) Advances in Neural Information Processing Systems 24, 523-531.

Bai, R. and Ghosh, M. (2018). High-Dimensional Multivariate Posterior Consistency Under Global-Local Shrinkage Priors. Revision at Journal of Multivariate Analysis. arXiv:1711.07635

Berger, J. (1980). A Robust Generalized Bayes Estimator and Confidence Region for a Multivariate Normal Mean. Annals of Statistics, 8(4): 716-761.

Carvalho, C.M., Polson, N.G., and Scott., J.G. (2010). The Horseshoe Estimator for Sparse Signals. Biometrika, 97(2): 465-480. Strawderman, W.E. (1971). Proper Bayes Minimax Estimators of the Multivariate Normal Mean. Annals of Mathematical STatistics, 42(1): 385-388.

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
#############################################
#############################################
## EXAMPLE ON SYNTHETIC DATA:              ##
## Can change n, p, q, p.act, max.steps,   ##
## and burnin below to reproduce the       ##
## simulations from Section 5.1 of Bai     ##
## and Ghosh                               ##
#############################################
#############################################
n <- 50
p <- 10
q <- 3
# Active number of predictors is 2
p.act <- 2

#############################
# Generate design matrix X. #
#############################
times <- 1:p
rho <- 0.5
H <- abs(outer(times, times, "-"))
V <- rho^H
mu <- rep(0, p)
# Rows of X are simulated from MVN(0,V)
X <- mvrnorm(n, mu, V)
# Center X
X <- scale(X, center=TRUE, scale=FALSE)

#########################################
# Generate true coefficient matrix B_0. #
#########################################
# Entries in nonzero rows are drawn from Unif[(-5,-0.5)U(0.5,5)] 
B.act <- runif(p.act*q,-5,4)
disjoint <- function(x){
         if(x <= -0.5)
            return(x)
         else
           return(x+1)
     }
B.act <- matrix(sapply(B.act, disjoint),p.act,q)

# Set rest of the rows equal to 0
B.true <- rbind(B.act,matrix(0,p-p.act,q))
B.true <- B.true[sample(1:p),] # permute the rows

########################################
# Generate true error covariance Sigma #
########################################
sigma.sq=2
times <- 1:q
H <- abs(outer(times, times, "-"))
Sigma <- sigma.sq * rho^H
  
###########################
# Generate noise matrix E #
###########################
mu <- rep(0,q)
E <- mvrnorm(n, mu, Sigma)

##############################
# Generate response matrix Y #
##############################
Y <- crossprod(t(X),B.true) + E

#########################################
# Run the MBSP model on synthetic data. #
#########################################

# For optimal estimation, change max.steps to 15,000 
# and change burnin to be 5000

mbsp.model = mbsp.tpbn(X=X, Y=Y, max.steps = 2000, burnin=1000)

## Not run: 
################################
# Compute MSE_est and MSE_pred #
################################
MSE.est <- sum((mbsp.model$B.est-B.true)^2)/(p*q)
MSE.est # Note that in the paper it is scaled by factor of 100

MSE.pred <- sum((crossprod(t(X), mbsp.model$B.est - B.true))^2)/(n*q)
MSE.pred # Note that in the paper it is scaled by a factor of 100

################################
# Compute the FDR, FNR, and MP #
################################

# Initialize vector for classification of predictors i = 1,...,p
predictor.classifications <- rep(0, p)
# Initialize vector for true classifications of predictors i = 1,...,p
true.classifications <- rep(0,p)

# True active predictors in B.True
true.active.indices <- which(rowSums(B.true) != 0)
# Rest true signals as 1
true.classifications[true.active.indices] <- 1

# Active predictors according to our estimates
predictor.classifications[mbsp.model$active.predictors] <- 1

# Keep track of false positives and false negatives
false.pos <- length(which(predictor.classifications != 0 & true.classifications == 0))
tot.pos <- length(which(predictor.classifications != 0))
false.neg <- length(which(predictor.classifications == 0 & true.classifications != 0))
tot.neg <- length(which(predictor.classifications == 0))

# Compute FDR, FNR, and MP
fdr.rate <- false.pos/tot.pos
fnr.rate <- false.neg/tot.neg
mis.prob <- (false.pos+false.neg)/p

fdr.rate
fnr.rate
mis.prob

## End(Not run)




## Not run: 
#
#
#############################################
############################################# 
## MBSP analysis of yeast cell cycle       ##
## data set (Section 5.2 of Bai and Ghosh) ##              
#############################################
#############################################

# Load yeast data set
data(yeast)

# Set seed
set.seed(12345)

# Design matrix X and response matrix Y
X <- yeast$x
X <- scale(X, center=TRUE, scale=FALSE)
Y <- yeast$y
Y <- scale(Y, center=TRUE, scale=FALSE)

# Make sure they are matrices
X <- matrix(X, nrow=nrow(X))
Y <- matrix(Y, nrow=nrow(Y))

###################################
# Run the MBSP model on the yeast #
# cell cycle data set             #
###################################
mbsp.model = mbsp.tpbn(X=X,Y=Y)

# Display the active predictors (correspond to row indices in B)
mbsp.model$active.predictors

# Display names of four of the active TFs
colnames(yeast$x)[2]
colnames(yeast$x)[38]
colnames(yeast$x)[61]
colnames(yeast$x)[94]

# For horizontal axis (time)
time <- seq(from=0, to=119, by=7)

##############################################
# Open pdf to plot 4 of the active TFs.      #
# This reproduces Figure 1 of Bai and  Ghosh #
##############################################
pdf(file=file.path(tempdir(), "significantTFs.pdf"), width=5, height=5)
par(mfrow=c(2,2), mar=c(2,2,2,2))
plot(time, mbsp.model$B.est[2,], type="l", cex.axis = 0.5, xlab="", 
     main="ACE2", ylim=c(-0.6,0.6), ylab="")
abline(h=0)
lines(time, mbsp.model$lower.endpoints[2,], lty=2)
lines(time, mbsp.model$upper.endpoints[2,], lty=2)

plot(time, mbsp.model$B.est[38,], type="l", cex.axis = 0.5, xlab="", 
     main="HIR1", ylim=c(-0.6,0.6), ylab="")
abline(h=0)
lines(time, mbsp.model$lower.endpoints[38,], lty=2)
lines(time, mbsp.model$upper.endpoints[38,], lty=2)

plot(time, mbsp.model$B.est[61,], type="l", cex.axis = 0.5, xlab="",
     main="NDD1", ylim=c(-0.6,0.6), ylab="")
abline(h=0)
lines(time, mbsp.model$lower.endpoints[61,], lty=2)
lines(time, mbsp.model$upper.endpoints[61,], lty=2)

plot(time, mbsp.model$B.est[94,], type="l", cex.axis = 0.5, xlab="", 
     main="SWI6", ylim=c(-0.6,0.6), ylab="")
abline(h=0)
lines(time, mbsp.model$lower.endpoints[94,], lty=2)
lines(time, mbsp.model$upper.endpoints[94,], lty=2)
dev.off()

## End(Not run)

MBSP documentation built on April 9, 2018, 5:05 p.m.