Description Usage Arguments Details Value References Examples
This function generates MCMC posterior samples of the Bayesian linear regression model parameters, using only summary statistics X'X, X'Y and Y'Y (e.g. calculated by the function read.regress.data.ff()
in this package). The samples are generated according to the user specified choices of prior distributions, hyperprior distributions and fixed parameter values where required; the user also specifies starting values for unknown model parameters.
1 2 3 4 | bayes.regress(data.values=NULL,
beta.prior=list("flat"),
sigmasq.prior=list("inverse.gamma", 1.0, 1.0, 1.0),
Tsamp.out=1000, zero.intercept=FALSE)
|
data.values |
a list with four (optionally five) components, which are created by the function
|
beta.prior |
a list that specifies the characteristics of the prior distribution for β, the vector of coefficients of the Bayesian linear regression model. There are three possible types:
In each of these three prior types, the list has a different structure, as follows:
For all three of the above |
sigmasq.prior |
a list that specifies the characteristics of the prior distribution for σ^2 (the variance of ε_i, i.e. the variance of the error terms in the Bayesian linear regression model). There are two types:
Similar to
|
Tsamp.out |
an optional scalar that specifies the number of MCMC samples to generate; default = 1,000. |
zero.intercept |
an optional logical parameter with default = |
This function uses the following Bayesian linear regression model:
y_i=x_i' β + ε_i,
where i = 1,...,numsamp.data; ε_i ~ N(0,σ^2); k is the number of predictor variables. The function uses user-supplied prior distributions for β and σ^2.
The Gibbs sampler is used to sample from all full conditional posterior distributions, which only depend on the summary statistics X'X, X'Y and Y'Y (and Y'X = (X'Y)'); these summary statistics are calculated by the function read.regress.data.ff()
(in this package), or can be provided by the user. Starting values are not needed for the vector β, since this vector is updated first, conditioned on all other unknown model parameters and the data.
The full conditional posterior distributions are the following for each prior specification of β; these depend on the data only through summary statistics X'X and X'Y:
beta.prior=list(type = "flat")
:
β | σ^2, X, Y ~ Normal_{k+1} (mean=((X'X)^(-1)(X'Y), covariance=(σ^2(X'X)^(-1))))
beta.prior=list(type = "mvnorm.known")
:
β | σ^2, X, Y ~ Normal_{k+1} (mean=(C^(-1)+σ^(-2)(X'X))^(-1)(C^(-1)μ + σ^(-2)X'Y),covariance=(C^(-1)+σ^(-2)(X'X)^(-1)))
beta.prior=list(type = "mvnorm.unknown")
:
β | σ^2, μ, C^(-1), X, Y ~ Normal_{k+1} (mean=(C^(-1)+σ^(-2)(X'X))^(-1)(C^(-1)μ + σ^(-2)X'Y),covariance=(C^(-1)+σ^(-2)(X'X)^(-1)))
μ | β, σ^2, C^(-1), X, Y ~ Normal_{k+1} (mean=(D^(-1)+C^(-1))^(-1)(C^(-1)β+D^(-1)η), covariance=(D^(-1)+C^(-1))^(-1))
C^(-1) | β, σ^2, μ, X, Y ~ Wishart_{k+1} (d.f. = (1+λ), scale matrix = (V^(-1)+ (β - μ)(β - μ)')^(-1))
The full conditional posterior distributions are the following for each prior specification of σ^2; these depend on the data only through summary statistics X'X, X'Y and Y'Y:
sigmasq.prior=list(type = "inverse.gamma")
:
σ^2 | β, X, Y ~ Inverse Gamma (numsamp.data/2+a, (0.5(Y'Y-β'X'Y-Y'Xβ+β'X'Xβ)+\frac{1}{b})^(-1))
sigmasq.prior=list(type = "sigmasq.inverse")
:
σ^2 | β, X, Y ~ Inverse Gamma (numsamp.data/2, (0.5(Y'Y-β'X'Y-Y'Xβ+β'X'Xβ))^(-1))
The returned value is a list containing the MCMC samples of the unknown Bayesian linear regression model parameters; the number of MCMC samples is equal to the argument Tsamp.out
. Further analysis, including plotting and creating summary statistics, can be carried out using the 'coda'
R package (see References).
Carlin, B.P. and Louis, T.A. (2009) Bayesian Methods for Data Analysis, 3rd ed., Boca Raton, FL: Chapman and Hall/CRC Press.
Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.B., Vehtari, A. and Rubin, D.B. (2013) Bayesian Data Analysis, 3rd ed., Boca Raton, FL: Chapman and Hall/CRC Press.
Plummer, M., Best, N., Cowles, K. and Vines, K. (2006) CODA: Convergence diagnosis and output analysis for MCMC. R News, 6(1), 7-11.
Adler, D., Glaser, C., Nenadic, O., Oehlschlagel, J. and Zucchini, W. (2013) ff: memory-efficient storage of large data on disk and fast access functions. R package: https://CRAN.R-project.org/package=ff.
Fasiolo, M. (2014) An introduction to mvnfast. R package: https://CRAN.R-project.org/package=mvnfast.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 | ##################################################
## Simulate data
##################################################
set.seed(284698)
num.samp <- 100 # number of data values to simulate
# The first value of the beta vector is the y-intercept:
beta <- c(-0.33, 0.78, -0.29, 0.47, -1.25)
# Calculate the number of predictor variables:
num.pred <- length(beta)-1
rho <- 0.0 # correlation between predictors
mean.vec <- rep(0,num.pred)
sigma.mat <- matrix(rho,num.pred,num.pred) + diag(1-rho,num.pred)
sigmasq.sim <- 0.05
# Simulate predictor variables:
x.pre <- rmvn(num.samp, mu=mean.vec, sigma=sigma.mat)
# Add leading column of 1's to x.pre for y-intercept:
x <- cbind(rep(1,num.samp),x.pre)
epsilon <- rnorm(num.samp, mean=0, sd=sqrt(sigmasq.sim))
y <- as.numeric( x %*% as.matrix(beta) + epsilon)
## Compute summary statistics (alternatively, the
# "read.regress.data.ff() function (in this package) can be
# used to calculate summary statistics; see example below).
xtx <- t(x)%*%x
xty <- t(x)%*%y
yty <- t(y)%*%y
data.values<-list(xtx=xtx, xty=xty, yty=yty,
numsamp.data = num.samp,
xtx.inv = chol2inv(chol(xtx)))
##########################################################
## Bayesian linear regression analysis
##########################################################
Tsamp.out <- 100 # number of MCMC samples to produce
## Choose priors for beta and sigma-squared. Here,
# beta: Uniform prior; sigma-squared: Inverse Gamma prior.
beta.prior <- list( type = "flat")
sigmasq.prior <- list(type = "inverse.gamma", inverse.gamma.a = 1.0,
inverse.gamma.b = 1.0, sigmasq.init = 1.0 )
set.seed(284698)
# Run the "bayes.regress()" function using the data simulated above.
MCMC.out <- bayes.regress(data.values,
beta.prior,
sigmasq.prior = sigmasq.prior,
Tsamp.out = Tsamp.out)
# Next, print the posterior means of the unknown model parameters.
# Alternatively, the "coda" package can be used for analysis.
print(c(colMeans(MCMC.out$beta), mean(MCMC.out$sigmasq)))
# Check that output is close to simulated values (although num.samp and
# Tsamp.out are small here); note that the output includes both beta and
# sigmasq:
# c(-0.33, 0.78, -0.29, 0.47, -1.25, 0.05)
## Run all 6 combinations of priors for 3 "beta.prior" choices and
# 2 "sigmasq.prior" choices:
beta.priors <- list(
list( type = "flat"),
list( type = "mvnorm.known",
mean.mu = rep(0.0, (num.pred+1)),
prec.Cinv = diag(1.0, (num.pred+1))),
list( type = "mvnorm.unknown",
mu.hyper.mean.eta = rep(0.0,(num.pred+1)),
mu.hyper.prec.Dinv = diag(1.0, (num.pred+1)),
Cinv.hyper.df.lambda = (num.pred+1),
Cinv.hyper.invscale.Vinv = diag(1.0, (num.pred+1)),
mu.init = rep(1.0, (num.pred+1)),
Cinv.init = diag(1.0,(num.pred+1)))
)
sigmasq.priors <- list(
list(type = "inverse.gamma",
inverse.gamma.a = 1.0,
inverse.gamma.b = 1.0,
sigmasq.init = 0.1 ),
list( type="sigmasq.inverse", sigmasq.init = 0.1)
)
for (beta.prior in beta.priors)
{
for(sigmasq.prior in sigmasq.priors)
{
set.seed(284698)
MCMC.out <- bayes.regress(data.values,
beta.prior,
sigmasq.prior = sigmasq.prior,
Tsamp.out = Tsamp.out)
print(c(colMeans(MCMC.out$beta), mean(MCMC.out$sigmasq)))
}
}
# Check that output is close to simulated values (although num.samp and
# Tsamp.out are small here); note that the output includes both beta and
# sigmasq:
# c(-0.33, 0.78, -0.29, 0.47, -1.25, 0.05):
#######################################################################
## Read the data from a file, calculate the summary statistics and run
## the Bayesian linear regression analysis
#######################################################################
Tsamp.out <- 100
## Assume non-zero y-intercept data.
# Read the files and compute summary statistics using the "read.regress.data.ff()"
# function (in this package).
filename <- system.file('data/regressiondata.nz.all.csv.gz', package='BayesSummaryStatLM')
data.values <- read.regress.data.ff(filename)
# Calculate the number of predictors.
num.pred <- length(data.values$xty)-1
## Run all 6 combinations of priors for 3 "beta.prior" choices and
# 2 "sigmasq.prior" choices:
beta.priors <- list(
list( type = "flat"),
list( type = "mvnorm.known",
mean.mu = rep(0.0, (num.pred+1)),
prec.Cinv = diag(1.0, (num.pred+1))),
list( type="mvnorm.unknown",
mu.hyper.mean.eta = rep(0.0, (num.pred+1)),
mu.hyper.prec.Dinv = diag(1.0, (num.pred+1)),
Cinv.hyper.df.lambda = (num.pred+1),
Cinv.hyper.invscale.Vinv = diag(1.0, (num.pred+1)),
mu.init = rep(1.0, (num.pred+1)),
Cinv.init = diag(1.0,(num.pred+1)))
)
sigmasq.priors <- list(
list(type = "inverse.gamma", inverse.gamma.a = 1.0,
inverse.gamma.b = 1.0, sigmasq.init = 0.5 ),
list( type = "sigmasq.inverse", sigmasq.init = 0.5)
)
for (beta.prior in beta.priors)
{
for(sigmasq.prior in sigmasq.priors)
{
set.seed(284698)
MCMC.out <- bayes.regress(data.values,
beta.prior,
sigmasq.prior = sigmasq.prior,
Tsamp.out = Tsamp.out)
print(c(colMeans(MCMC.out$beta), mean(MCMC.out$sigmasq)))
}
}
# Check that output is close to simulated values (although num.samp and
# Tsamp.out are small here); note that the output includes both beta and
# sigmasq:
# c( 0.76, -0.92, 0.64, 0.57, -1.65, 0.25)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.