# nbp: Normal-Beta Prime Regression In NormalBetaPrime: Normal Beta Prime Prior

## Description

This function implements the Monte Carlo EM (Gibbs sampling) approach for the normal-beta prime (NBP) model in the standard linear regression model,

y = X β + ε,

where ε \sim N_n (0, σ^2 I_n). This is achieved by placing the normal-beta prime (NBP) prior of Bai and Ghosh (2019) on the coefficients of β. In the case where p > n, we utilize an efficient sampler from Bhattacharya et al. (2016) to reduce the computational cost of sampling from the full conditional density of β to O(n^2 p). The hyperparameters can be set deterministically by the user or they may be automatically selected by marginal maximum likelihood (MML).

Note that the Gibbs sampling implementation is slower than the variational Bayes (VB) function, nbp.VB, but nbp tends to give much more accurate point esimates and posterior approximations. It is recommended that the user use nbp.

## Usage

 1 2 nbp(X, y, method.hyperparameters=c("fixed","mml"), a=0.5, b=0.5, c=1e-5, d=1e-5, selection=c("dss", "intervals"), max.steps = 15000, burnin=10000) 

## Arguments

 X n \times p design matrix. Should be centered. y n \times 1 response vector. Should be centered. method.hyperparameters The method for estimating the shape parameters (a,b). If "mml" is chosen, the function estimates the marginal maximum likelihood (MML) estimates of (a,b) by the EM algorithm described in Bai and Ghosh (2019). If "fixed" is chosen, then (a,b) are not estimated from the data. a Shape parameter for β'(a,b). The default is 0.5. The user may specify a different value for a (a>0). This is ignored if the method for estimating hyperparameters is "mml". b Shape parameter for β'(a,b). The default is 0.5. The user may specify a different value for b (b>0). This is ignored if the method for estimating hyperparameters is "mml". c The shape parameter for the IG(c,d) prior on unknown variance parameter, σ^2. The default is 10^{-5}. d The rate parameter for the IG(c,d) prior on the unknown variance parameter, σ^2. The default is 10^{-5}. selection The method of variable selection. "dss" implements the decoupled selection and shrinkage (DSS) method of Hahn and Carvalho (2015) to select variables. "intervals" performs variable selection by examining the 95 percent posterior credible intervals for each coefficient, β_i, i = 1, …, p. max.steps The total number of iterations to run in the Gibbs sampler. Defaults to 15,000. burnin The number of burn-in iterations for the Gibbs sampler. Defaults to 10,000.

## Details

The function implements the normal-beta prime (NBP) model of Bai and Ghosh (2019) using Gibbs sampling. The posterior variances and 95 percent credible intervals for each of the p covariates are also returned so that the user may assess uncertainty quantification. The full model is:

Y | (X, β ) \sim N_n(X β, σ^2 I_n),

β_i | ω_i^2 \sim N(0, σ^2 ω_i^2), i = 1, ..., p,

ω_i^2 \sim β'(a,b), i = 1, ..., p,

σ^2 \sim IG(c,d),

where β'(a,b) denotes the beta prime density,

π(ω_i^2) = \frac{Γ(a+b)}{Γ(a) Γ(b)} (ω_i^2)^{a-1} (1+ω_i^2)^{-a-b}.

## Value

The function returns a list containing the following components:

 beta.hat The posterior mean estimate of β. beta.med The posterior median estimate of β. beta.var The posterior variance estimates for each β_i, i=1, …, p. beta.intervals The 95 percent credible intervals for all p estimates in β. nbp.classifications A p-dimensional binary vector with "1" if the covariate is selected and "0" if it is deemed irrelevant. sigma2.esimate Estimate of unknown variance component σ^2. a.estimate MML estimate of shape parameter a. If a was fixed a priori, returns fixed a. b.estimate MML estimate of shape parameter b. If b was fixed a prior, returns fixed b.

## Author(s)

Ray Bai and Malay Ghosh

## References

Bai, R. and Ghosh, M. (2019). "On the beta prime prior for scale parameters in high-dimensional Bayesian regression models." Pre-print, arXiv:1807.06539.

Bhattacharya, A., Chakraborty, A., and Mallick, B.K. (2016). "Fast sampling with Gaussian scale mixture priors in high-dimensional regression." Biometrika, 69(2): 447-457.

Hahn, P. R. and Carvalho, C. M. (2015). "Decoupling shrinkage and selection in Bayesian linear models: A posterior summary perspective." Journal of the American Statistical Association, 110(509):435-448.

## Examples

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 ############################### ## Example on diabetes data. ## ############################### data(diabetes) attach(diabetes) X <- scale(diabetes$x) # Center and scale X y <- scale(diabetes$y) # Center and scale y ################################ # Fit the NBP regression model # ################################ # Should use default of 15,000 for max.steps and 10,000 for burnin nbp.model <- nbp(X=X, y=y, method.hyperparameters="mml", max.steps=5000, burnin=2500, selection="dss") nbp.model$beta.med # posterior median estimates nbp.model$a.estimate # MML estimate of shape parameter 'a' nbp.model$b.estimate # MML estimate of shape parameter 'b' nbp.model$beta.intervals # 95 percent posterior credible intervals nbp.model$nbp.classifications # Variables selected # # #################################################### # TRIM32 gene expression data analysis. # # Running this code will allow you to reproduce # # the results in Section 7 of Bai and Ghosh (2019) # #################################################### # Load the data data(eyedata) # Set seed set.seed(1) # Center design matrix X and response vector y X <- scale(genes, center=TRUE, scale=TRUE) # gene expression data (covariates) y <- scale(trim32, center=TRUE, scale=TRUE) # levels of TRIM32 (response) ########################### # Implement the NBP model # ########################### nbp.model = nbp(X,y, method.hyperparameters="mml", selection="dss") # Variables selected active.indices <- which(nbp.model$nbp.classifications != 0) # active genes active.estimates <- nbp.model$beta.med[active.indices] active.CIs <- nbp.model$beta.intervals[, active.indices] ################################### # Evaluate predictive performance # ################################### k <- 5 # Number of folds # Randomly select indices for the 5 different folds folds <- split(sample(nrow(X), nrow(X), replace=FALSE), as.factor(1:k)) # To store the mean square prediction error mspe.nbp <- rep(NA, k) for(i in 1:k){ # Split data into training set and test set based on folds test_ind = folds[[i]] # 80 percent training set y.train = y[-test_ind] X.train = X[-test_ind, ] # Rest is test set y.test = y[test_ind] X.test = X[test_ind, ] # Run NBP model on training set nbp.train = nbp(X=X.train, y=y.train, method.hyperparameters="mml", selection="dss") beta.train <- nbp.train\$beta.med # Obtain predictions on test set nbp.pred = crossprod(t(X.test), beta.train) # Compute the MSPE on test set mspe.nbp[i] = mean((nbp.pred - y.test)^2) } mean.nbp.mspe <- mean(mspe.nbp) mean.nbp.mspe # # 

NormalBetaPrime documentation built on May 2, 2019, 1:39 p.m.