Home

/

CRAN

/

bayes.regress: MCMC posterior sampling of Bayesian linear regression model...

bayes.regress: MCMC posterior sampling of Bayesian linear regression model...
In BayesSummaryStatLM: MCMC Sampling of Bayesian Linear Models via Summary Statistics

Description Usage Arguments Details Value References Examples

This function generates MCMC posterior samples of the Bayesian linear regression model parameters, using only summary statistics X'X, X'Y and Y'Y (e.g. calculated by the function read.regress.data.ff() in this package). The samples are generated according to the user specified choices of prior distributions, hyperprior distributions and fixed parameter values where required; the user also specifies starting values for unknown model parameters.

bayes.regress(data.values=NULL, 
              beta.prior=list("flat"), 
              sigmasq.prior=list("inverse.gamma", 1.0, 1.0, 1.0),
              Tsamp.out=1000, zero.intercept=FALSE)

`data.values`	a list with four (optionally five) components, which are created by the function `read.regress.data.ff()` (in this package): `xtx`: a square matrix that stores the product X'X, where X is the data from predictor columns with a leading column of 1's for the y-intercept term. `xty`: a column vector that stores the product X'Y, where X is the same as above and Y is a column of response data values. `yty`: a scalar value that stores the product Y'Y, where Y is the same as above. `numsamp.data`: an integer equal to the number of data values of the predictor variables X. `xtx.inv` (optional): the inverse of the matrix `xtx` that is used for the “Uniform” prior distribution for β to speed up computations if the function is used repeatedly with the same `xtx`. If omitted, this inverse will be computed automatically. This component is ignored for other prior distributions.
`beta.prior`	a list that specifies the characteristics of the prior distribution for β, the vector of coefficients of the Bayesian linear regression model. There are three possible types: `flat`: Uniform distribution. `mvnorm.known`: Multivariate Normal with known mean vector μ and known covariance matrix C. `mvnorm.unknown`: Multivariate Normal with unknown mean vector μ and unknown covariance matrix C. This prior also includes the hyperpriors for μ and C, where μ ~ Multivariate Normal(η, D), and C^(-1) ~ Wishart(d.f. = λ, scale matrix = V); η, D, λ, V assumed known. In each of these three prior types, the list has a different structure, as follows: `beta.prior=list(type = "flat")`: a Uniform prior distribution for β; no other specification is necessary. This prior distribution is used by default. `beta.prior=list(type = "mvnorm.known", mean.mu = ..., cov.C = ..., prec.Cinv = ... )` `mean.mu`: the fixed known prior mean vector μ for the Multivariate Normal prior of β. The default is a vector of 0's with length equal to the length of β. `cov.C`: the fixed known prior covariance matrix C for the Multivariate Normal prior of β. The default is an identity matrix with dimension equal to the length of β. `prec.Cinv`: the inverse of the covariance matrix C above. If `cov.C` is not specified, `prec.Cinv` is assigned the identity matrix by default, with dimension equal to the length of β. It is advised to supply `prec.Cinv` matrix and omit `cov.C` for speeding up the algorithm. In case both are supplied, the algorithm gives preference to `prec.Cinv`. `beta.prior=list(type = "mvnorm.unknown", mu.hyper.mean.eta = ..., mu.hyper.prec.Dinv = ..., Cinv.hyper.df.lambda = ..., Cinv.hyper.invscale.Vinv = ..., mu.init = ..., Cinv.init = ...)` `mu.hyper.mean.eta`: the fixed known hyperparameter mean vector η for the Multivariate Normal hyperprior mean μ. The default is a vector of 0's with length equal to the length of β. `mu.hyper.prec.Dinv`: the fixed known hyperparameter precision matrix D^(-1) for the Multivariate Normal hyperprior mean μ. The default is an identity matrix with dimension equal to the length of β. `Cinv.hyper.df.lambda`: the fixed known degrees of freedom λ for the Wishart hyperprior for C^(-1). The default value is the length of β . `Cinv.hyper.invscale.Vinv`: the fixed known hyperparameter inverse scale matrix V^(-1) for the Wishart hyperprior for C^(-1). The default is an identity matrix with dimension equal to the length of β. `mu.init`: initial value for μ for the MCMC chain. The default is a vector of 1's with length equal to the length of β. `Cinv.init`: initial value for C^(-1) for the MCMC chain. The default is an identity matrix with dimension equal to the length of β. For all three of the above `beta.prior` distributions, only the `type` is mandatory; the remaining parameters are assigned default values if omitted.
`sigmasq.prior`	a list that specifies the characteristics of the prior distribution for σ^2 (the variance of ε_i, i.e. the variance of the error terms in the Bayesian linear regression model). There are two types: `inverse.gamma`: Inverse Gamma distribution with known shape and scale parameters a and b, respectively. `sigmasq.inverse`: inverse sigma-squared distribution. Similar to `beta.prior` above, the structure of the list depends on the type of prior distribution chosen. The list must be supplied in either of the following structures: `sigmasq.prior=list(type = "inverse.gamma", inverse.gamma.a = ..., inverse.gamma.b = ..., sigmasq.init = ...)` `inverse.gamma.a`: the shape parameter a for the Inverse Gamma prior distribution, assumed known; default = 1. `inverse.gamma.b`: the scale parameter b for the Inverse Gamma prior distribution, assumed known; default = 1. `sigmasq.init`: the initial value for the unknown σ^2 parameter for the MCMC chain; default = 1. `sigmasq.prior=list(type="sigmasq.inverse", sigmasq.init = ...)`. `sigmasq.init`: the initial value for the unknown σ^2 parameter for the MCMC chain; default = 1.
`Tsamp.out`	an optional scalar that specifies the number of MCMC samples to generate; default = 1,000.
`zero.intercept`	an optional logical parameter with default = `FALSE`. If `zero.intercept = TRUE` is specified, the linear regression model sets the y-intercept term β_0 to zero; the corresponding y-intercept terms of the matrices `data.values$xtx` and `data.values$xty` are ignored, and the β vector is revised throughout the models and output automatically by the function.

This function uses the following Bayesian linear regression model:

y_i=x_i' β + ε_i,

where i = 1,...,numsamp.data; ε_i ~ N(0,σ^2); k is the number of predictor variables. The function uses user-supplied prior distributions for β and σ^2.

The Gibbs sampler is used to sample from all full conditional posterior distributions, which only depend on the summary statistics X'X, X'Y and Y'Y (and Y'X = (X'Y)'); these summary statistics are calculated by the function read.regress.data.ff() (in this package), or can be provided by the user. Starting values are not needed for the vector β, since this vector is updated first, conditioned on all other unknown model parameters and the data.

The full conditional posterior distributions are the following for each prior specification of β; these depend on the data only through summary statistics X'X and X'Y:
- beta.prior=list(type = "flat"):
  
  β | σ^2, X, Y ~ Normal_{k+1} (mean=((X'X)^(-1)(X'Y), covariance=(σ^2(X'X)^(-1))))
- beta.prior=list(type = "mvnorm.known"):
  
  β | σ^2, X, Y ~ Normal_{k+1} (mean=(C^(-1)+σ^(-2)(X'X))^(-1)(C^(-1)μ + σ^(-2)X'Y),covariance=(C^(-1)+σ^(-2)(X'X)^(-1)))
- beta.prior=list(type = "mvnorm.unknown"):
  
  β | σ^2, μ, C^(-1), X, Y ~ Normal_{k+1} (mean=(C^(-1)+σ^(-2)(X'X))^(-1)(C^(-1)μ + σ^(-2)X'Y),covariance=(C^(-1)+σ^(-2)(X'X)^(-1)))
  
  μ | β, σ^2, C^(-1), X, Y ~ Normal_{k+1} (mean=(D^(-1)+C^(-1))^(-1)(C^(-1)β+D^(-1)η), covariance=(D^(-1)+C^(-1))^(-1))
  
  C^(-1) | β, σ^2, μ, X, Y ~ Wishart_{k+1} (d.f. = (1+λ), scale matrix = (V^(-1)+ (β - μ)(β - μ)')^(-1))

The full conditional posterior distributions are the following for each prior specification of σ^2; these depend on the data only through summary statistics X'X, X'Y and Y'Y:
- sigmasq.prior=list(type = "inverse.gamma"):
  
  σ^2 | β, X, Y ~ Inverse Gamma (numsamp.data/2+a, (0.5(Y'Y-β'X'Y-Y'Xβ+β'X'Xβ)+\frac{1}{b})^(-1))
- sigmasq.prior=list(type = "sigmasq.inverse"):
  
  σ^2 | β, X, Y ~ Inverse Gamma (numsamp.data/2, (0.5(Y'Y-β'X'Y-Y'Xβ+β'X'Xβ))^(-1))

The returned value is a list containing the MCMC samples of the unknown Bayesian linear regression model parameters; the number of MCMC samples is equal to the argument Tsamp.out. Further analysis, including plotting and creating summary statistics, can be carried out using the 'coda' R package (see References).

Carlin, B.P. and Louis, T.A. (2009) Bayesian Methods for Data Analysis, 3rd ed., Boca Raton, FL: Chapman and Hall/CRC Press.

Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.B., Vehtari, A. and Rubin, D.B. (2013) Bayesian Data Analysis, 3rd ed., Boca Raton, FL: Chapman and Hall/CRC Press.

Plummer, M., Best, N., Cowles, K. and Vines, K. (2006) CODA: Convergence diagnosis and output analysis for MCMC. R News, 6(1), 7-11.

Adler, D., Glaser, C., Nenadic, O., Oehlschlagel, J. and Zucchini, W. (2013) ff: memory-efficient storage of large data on disk and fast access functions. R package: https://CRAN.R-project.org/package=ff.

Fasiolo, M. (2014) An introduction to mvnfast. R package: https://CRAN.R-project.org/package=mvnfast.

##################################################
## Simulate data 
##################################################

set.seed(284698)

num.samp  <- 100 # number of data values to simulate

# The first value of the beta vector is the y-intercept:
beta <- c(-0.33, 0.78, -0.29, 0.47, -1.25)

# Calculate the number of predictor variables:
num.pred <- length(beta)-1

rho       <- 0.0  # correlation between predictors
mean.vec  <- rep(0,num.pred)
sigma.mat <- matrix(rho,num.pred,num.pred) + diag(1-rho,num.pred)
sigmasq.sim <- 0.05

# Simulate predictor variables:
x.pre       <- rmvn(num.samp, mu=mean.vec, sigma=sigma.mat)       

# Add leading column of 1's to x.pre for y-intercept:
x <- cbind(rep(1,num.samp),x.pre)

epsilon <- rnorm(num.samp, mean=0, sd=sqrt(sigmasq.sim))

y  <- as.numeric( x %*% as.matrix(beta) +  epsilon)

## Compute summary statistics (alternatively, the
# "read.regress.data.ff() function (in this package) can be 
# used to calculate summary statistics; see example below).

xtx <- t(x)%*%x 
xty <- t(x)%*%y 
yty <- t(y)%*%y 

data.values<-list(xtx=xtx, xty=xty, yty=yty,
                  numsamp.data = num.samp, 
                  xtx.inv = chol2inv(chol(xtx)))

##########################################################
## Bayesian linear regression analysis
##########################################################

Tsamp.out <- 100 # number of MCMC samples to produce

## Choose priors for beta and sigma-squared. Here,
# beta: Uniform prior; sigma-squared: Inverse Gamma prior. 

beta.prior    <- list( type = "flat")
sigmasq.prior <- list(type = "inverse.gamma", inverse.gamma.a = 1.0, 
                      inverse.gamma.b = 1.0, sigmasq.init = 1.0 )

set.seed(284698)

# Run the "bayes.regress()" function using the data simulated above.

MCMC.out <- bayes.regress(data.values, 
                          beta.prior, 
                          sigmasq.prior = sigmasq.prior, 
                          Tsamp.out = Tsamp.out)

# Next, print the posterior means of the unknown model parameters.
# Alternatively, the "coda" package can be used for analysis.

print(c(colMeans(MCMC.out$beta), mean(MCMC.out$sigmasq)))

# Check that output is close to simulated values (although num.samp and
# Tsamp.out are small here); note that the output includes both beta and 
# sigmasq:
# c(-0.33,  0.78, -0.29,  0.47, -1.25,  0.05)

## Run all 6 combinations of priors for 3 "beta.prior" choices and 
#  2 "sigmasq.prior" choices:

beta.priors <- list(
  list( type = "flat"),
  
  list( type = "mvnorm.known", 
        mean.mu = rep(0.0,    (num.pred+1)), 
        prec.Cinv = diag(1.0, (num.pred+1))),
        
  list( type = "mvnorm.unknown",
        mu.hyper.mean.eta         = rep(0.0,(num.pred+1)),  
        mu.hyper.prec.Dinv        = diag(1.0, (num.pred+1)),  
        Cinv.hyper.df.lambda      = (num.pred+1), 
        Cinv.hyper.invscale.Vinv  = diag(1.0, (num.pred+1)),  
        mu.init                   = rep(1.0, (num.pred+1)), 
        Cinv.init                 = diag(1.0,(num.pred+1)))   
)

sigmasq.priors <- list(
  list(type = "inverse.gamma", 
       inverse.gamma.a = 1.0, 
       inverse.gamma.b = 1.0, 
       sigmasq.init = 0.1 ),       
  list( type="sigmasq.inverse", sigmasq.init = 0.1)
)

for (beta.prior in beta.priors)
{
  for(sigmasq.prior in sigmasq.priors)
  {
    set.seed(284698)
    MCMC.out <- bayes.regress(data.values, 
                              beta.prior, 
                              sigmasq.prior = sigmasq.prior, 
                              Tsamp.out = Tsamp.out)
    print(c(colMeans(MCMC.out$beta), mean(MCMC.out$sigmasq)))
  }
}

# Check that output is close to simulated values (although num.samp and
# Tsamp.out are small here); note that the output includes both beta and 
# sigmasq:
# c(-0.33,  0.78, -0.29,  0.47, -1.25,  0.05):


#######################################################################
## Read the data from a file, calculate the summary statistics and run 
## the Bayesian linear regression analysis
#######################################################################

Tsamp.out <- 100

## Assume non-zero y-intercept data.

# Read the files and compute summary statistics using the "read.regress.data.ff()" 
# function (in this package).


filename <- system.file('data/regressiondata.nz.all.csv.gz', package='BayesSummaryStatLM')
data.values <- read.regress.data.ff(filename)

# Calculate the number of predictors.

num.pred <- length(data.values$xty)-1

## Run all 6 combinations of priors for 3 "beta.prior" choices and 
#  2 "sigmasq.prior" choices:

beta.priors <- list(
  list( type = "flat"),
  
  list( type = "mvnorm.known", 
        mean.mu = rep(0.0,    (num.pred+1)), 
        prec.Cinv = diag(1.0, (num.pred+1))),
        
  list( type="mvnorm.unknown",
        mu.hyper.mean.eta         = rep(0.0,  (num.pred+1)),  
        mu.hyper.prec.Dinv    	  = diag(1.0, (num.pred+1)),  
        Cinv.hyper.df.lambda      = (num.pred+1), 
        Cinv.hyper.invscale.Vinv  = diag(1.0, (num.pred+1)),  
        mu.init                   = rep(1.0, (num.pred+1)),      
        Cinv.init                 = diag(1.0,(num.pred+1)))   
)

sigmasq.priors <- list(
  list(type = "inverse.gamma", inverse.gamma.a = 1.0, 
               inverse.gamma.b = 1.0, sigmasq.init = 0.5 ),
  list( type = "sigmasq.inverse", sigmasq.init = 0.5)
)

for (beta.prior in beta.priors)
{
  for(sigmasq.prior in sigmasq.priors)
  {

    set.seed(284698)
    MCMC.out <- bayes.regress(data.values, 
                              beta.prior, 
                              sigmasq.prior = sigmasq.prior, 
                              Tsamp.out = Tsamp.out)
                              
    print(c(colMeans(MCMC.out$beta), mean(MCMC.out$sigmasq)))
  }
}

# Check that output is close to simulated values (although num.samp and
# Tsamp.out are small here); note that the output includes both beta and                           
# sigmasq:
# c( 0.76, -0.92, 0.64, 0.57, -1.65, 0.25)

BayesSummaryStatLM documentation built on July 1, 2021, 5:06 p.m.

BayesSummaryStatLM index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

BayesSummaryStatLM
MCMC Sampling of Bayesian Linear Models via Summary Statistics

bayes.regress: MCMC posterior sampling of Bayesian linear regression model...
In BayesSummaryStatLM: MCMC Sampling of Bayesian Linear Models via Summary Statistics

Description

Usage

Arguments

Details

Value

References

Examples

Related to bayes.regress in BayesSummaryStatLM...

R Package Documentation

Browse R Packages

We want your feedback!

BayesSummaryStatLM MCMC Sampling of Bayesian Linear Models via Summary Statistics

bayes.regress: MCMC posterior sampling of Bayesian linear regression model... In BayesSummaryStatLM: MCMC Sampling of Bayesian Linear Models via Summary Statistics

Description

Usage

Arguments

Details

Value

References

Examples

Related to bayes.regress in BayesSummaryStatLM...

R Package Documentation

Browse R Packages

We want your feedback!

BayesSummaryStatLM
MCMC Sampling of Bayesian Linear Models via Summary Statistics

bayes.regress: MCMC posterior sampling of Bayesian linear regression model...
In BayesSummaryStatLM: MCMC Sampling of Bayesian Linear Models via Summary Statistics