# mbsp.tpbn: MBSP Model with Three Parameter Beta Normal (TPBN) Family In MBSP: Multivariate Bayesian Model with Shrinkage Priors

## Description

This function provides a fully Bayesian approach for obtaining a sparse estimate of the p \times q coefficients matrix B in the multivariate linear regression model,

Y = XB+E,

using the three parameter beta normal (TPBN) family.

## Usage

 1 mbsp.tpbn(X, Y, u=0.5, a=0.5, tau=NA, max.steps=15000, burnin=5000) 

## Arguments

 X design matrix of n samples and p covariates. Y response matrix of n samples and q response variables. u The first parameter in the TPBN family. Defaults to u=0.5 for the horseshoe prior. a The second parameter in the TPBN family. Defaults to a=0.5 for the horseshoe prior. tau The global parameter. If the user does not specify this (tau=NA), the Gibbs sampler will use τ = 1/(p*n*log(n)). The user may also specify a value for τ between 0 and 1, otherwise it defaults to τ = 1/(p*n*log(n)). max.steps The total number of iterations to run in the Gibbs sampler. Defaults to 15,000. burnin The number of burn-in iterations for the Gibbs sampler. Defaults to 5,000.

## Details

The function performs sparse estimation of B and variable selection from the p covariates. The lower and upper endpoints of the 95 percent posterior credible intervals for each of the pq elements of B are also returned so that the user may assess uncertainty quantification.

In the three parameter beta normal (TPBN) family, (u,a)=(0.5,0.5) corresponds to the horseshoe prior, (u,a)=(1,0.5) corresponds to the Strawderman-Berger prior, and (u,a)=(1,a), a > 0 corresponds to the normal-exponential-gamma (NEG) prior. This function uses the horseshoe prior as the default shinkrage prior.

## Value

The function returns a list containing the following components:

 B.est the point estimate of the p \times q matrix B (taken as the componentwise posterior median for all pq entries). lower.endpoints The 2.5th percentile of the posterior density (or the lower endpoint of the 95 percent credible interval) for all pq entries of B. upper.endpoints The 97.5th percentile of the posterior density (or the upper endpoint of the 95 percent credible interval) for all pq entries of B. active.predictors The active (nonzero) covariates chosen by our model from the p total predictors.

## Author(s)

Ray Bai and Malay Ghosh

## References

Armagan, A., Clyde, M., and Dunson, D.B. (2011) Generalized Beta Mixtures of Gaussians. In J. Shawe-taylor, R. Zemel, P. Bartlett, F. Pereira, and K. Weinberger (Eds.) Advances in Neural Information Processing Systems 24, 523-531.

Bai, R. and Ghosh, M. (2018). High-Dimensional Multivariate Posterior Consistency Under Global-Local Shrinkage Priors. Revision at Journal of Multivariate Analysis. arXiv:1711.07635

Berger, J. (1980). A Robust Generalized Bayes Estimator and Confidence Region for a Multivariate Normal Mean. Annals of Statistics, 8(4): 716-761.

Carvalho, C.M., Polson, N.G., and Scott., J.G. (2010). The Horseshoe Estimator for Sparse Signals. Biometrika, 97(2): 465-480. Strawderman, W.E. (1971). Proper Bayes Minimax Estimators of the Multivariate Normal Mean. Annals of Mathematical STatistics, 42(1): 385-388.

## Examples

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 ############################################# ############################################# ## EXAMPLE ON SYNTHETIC DATA: ## ## Can change n, p, q, p.act, max.steps, ## ## and burnin below to reproduce the ## ## simulations from Section 5.1 of Bai ## ## and Ghosh ## ############################################# ############################################# n <- 50 p <- 10 q <- 3 # Active number of predictors is 2 p.act <- 2 ############################# # Generate design matrix X. # ############################# times <- 1:p rho <- 0.5 H <- abs(outer(times, times, "-")) V <- rho^H mu <- rep(0, p) # Rows of X are simulated from MVN(0,V) X <- mvrnorm(n, mu, V) # Center X X <- scale(X, center=TRUE, scale=FALSE) ######################################### # Generate true coefficient matrix B_0. # ######################################### # Entries in nonzero rows are drawn from Unif[(-5,-0.5)U(0.5,5)] B.act <- runif(p.act*q,-5,4) disjoint <- function(x){ if(x <= -0.5) return(x) else return(x+1) } B.act <- matrix(sapply(B.act, disjoint),p.act,q) # Set rest of the rows equal to 0 B.true <- rbind(B.act,matrix(0,p-p.act,q)) B.true <- B.true[sample(1:p),] # permute the rows ######################################## # Generate true error covariance Sigma # ######################################## sigma.sq=2 times <- 1:q H <- abs(outer(times, times, "-")) Sigma <- sigma.sq * rho^H ########################### # Generate noise matrix E # ########################### mu <- rep(0,q) E <- mvrnorm(n, mu, Sigma) ############################## # Generate response matrix Y # ############################## Y <- crossprod(t(X),B.true) + E ######################################### # Run the MBSP model on synthetic data. # ######################################### # For optimal estimation, change max.steps to 15,000 # and change burnin to be 5000 mbsp.model = mbsp.tpbn(X=X, Y=Y, max.steps = 2000, burnin=1000) ## Not run: ################################ # Compute MSE_est and MSE_pred # ################################ MSE.est <- sum((mbsp.model$B.est-B.true)^2)/(p*q) MSE.est # Note that in the paper it is scaled by factor of 100 MSE.pred <- sum((crossprod(t(X), mbsp.model$B.est - B.true))^2)/(n*q) MSE.pred # Note that in the paper it is scaled by a factor of 100 ################################ # Compute the FDR, FNR, and MP # ################################ # Initialize vector for classification of predictors i = 1,...,p predictor.classifications <- rep(0, p) # Initialize vector for true classifications of predictors i = 1,...,p true.classifications <- rep(0,p) # True active predictors in B.True true.active.indices <- which(rowSums(B.true) != 0) # Rest true signals as 1 true.classifications[true.active.indices] <- 1 # Active predictors according to our estimates predictor.classifications[mbsp.model$active.predictors] <- 1 # Keep track of false positives and false negatives false.pos <- length(which(predictor.classifications != 0 & true.classifications == 0)) tot.pos <- length(which(predictor.classifications != 0)) false.neg <- length(which(predictor.classifications == 0 & true.classifications != 0)) tot.neg <- length(which(predictor.classifications == 0)) # Compute FDR, FNR, and MP fdr.rate <- false.pos/tot.pos fnr.rate <- false.neg/tot.neg mis.prob <- (false.pos+false.neg)/p fdr.rate fnr.rate mis.prob ## End(Not run) ## Not run: # # ############################################# ############################################# ## MBSP analysis of yeast cell cycle ## ## data set (Section 5.2 of Bai and Ghosh) ## ############################################# ############################################# # Load yeast data set data(yeast) # Set seed set.seed(12345) # Design matrix X and response matrix Y X <- yeast$x X <- scale(X, center=TRUE, scale=FALSE) Y <- yeast$y Y <- scale(Y, center=TRUE, scale=FALSE) # Make sure they are matrices X <- matrix(X, nrow=nrow(X)) Y <- matrix(Y, nrow=nrow(Y)) ################################### # Run the MBSP model on the yeast # # cell cycle data set # ################################### mbsp.model = mbsp.tpbn(X=X,Y=Y) # Display the active predictors (correspond to row indices in B) mbsp.model$active.predictors # Display names of four of the active TFs colnames(yeast$x)[2] colnames(yeast$x)[38] colnames(yeast$x)[61] colnames(yeast$x)[94] # For horizontal axis (time) time <- seq(from=0, to=119, by=7) ############################################## # Open pdf to plot 4 of the active TFs. # # This reproduces Figure 1 of Bai and Ghosh # ############################################## pdf(file=file.path(tempdir(), "significantTFs.pdf"), width=5, height=5) par(mfrow=c(2,2), mar=c(2,2,2,2)) plot(time, mbsp.model$B.est[2,], type="l", cex.axis = 0.5, xlab="", main="ACE2", ylim=c(-0.6,0.6), ylab="") abline(h=0) lines(time, mbsp.model$lower.endpoints[2,], lty=2) lines(time, mbsp.model$upper.endpoints[2,], lty=2) plot(time, mbsp.model$B.est[38,], type="l", cex.axis = 0.5, xlab="", main="HIR1", ylim=c(-0.6,0.6), ylab="") abline(h=0) lines(time, mbsp.model$lower.endpoints[38,], lty=2) lines(time, mbsp.model$upper.endpoints[38,], lty=2) plot(time, mbsp.model$B.est[61,], type="l", cex.axis = 0.5, xlab="", main="NDD1", ylim=c(-0.6,0.6), ylab="") abline(h=0) lines(time, mbsp.model$lower.endpoints[61,], lty=2) lines(time, mbsp.model$upper.endpoints[61,], lty=2) plot(time, mbsp.model$B.est[94,], type="l", cex.axis = 0.5, xlab="", main="SWI6", ylim=c(-0.6,0.6), ylab="") abline(h=0) lines(time, mbsp.model$lower.endpoints[94,], lty=2) lines(time, mbsp.model$upper.endpoints[94,], lty=2) dev.off() ## End(Not run) 

MBSP documentation built on April 9, 2018, 5:05 p.m.