bfs: Bayes Factor Data
In hetGP: Heteroskedastic Gaussian Process Modeling and Design under Replication

bfs	R Documentation

Bayes Factor Data

Description

Data from a Bayes factor MCMC-based simulation experiment comparing Student-t to Gaussian errors in an RJ-based Laplace prior Bayesian linear regession setting

Usage

data(ato)

Format

Calling data(bfs) causes the following objects to be loaded into the namespace.

bfs.exp: 20x11 data.frame whose first column is \theta, indicating the mean parameter of an exponential distribution encoding the prior of the Student-t degrees of freedom parameter \nu. The remaining ten columns comprise of Bayes factor evaluations under that setting
bfs.gamma: 80x7 data.frame whose first two columns are \beta and \alpha, indicating the second and first parameters to a Gamma distribution encoding the prior of the Student-t degrees of freedom parameters \nu. The remaining five columns comprise of Bayes factor evaluations under those settings

Details

Gramacy & Pantaleo (2010), Sections 3-3-3.4, describe an experiment involving Bayes factor (BF) calculations to determine if data are leptokurtic (Student-t errors) or not (simply Gaussian) as a function of the prior parameterization on the Student-t degrees of freedom parameter \nu. Franck & Gramacy (2018) created a grid of hyperparameter values in \theta describing the mean of an Exponential distribution, evenly spaced in \log_{10} space from 10^(-3) to 10^6 spanning “solidly Student-t” (even Cauchy) to “essentially Gaussian” in terms of the mean of the prior over \nu. For each \theta setting on the grid they ran the Reversible Jump (RJ) MCMC to approximate the BF of Student-t over Gaussian by feeding in sample likelihood evaluations provided by monomvn's blasso to compute the BF. In order to understand the Monte Carlo variability in those calculations, ten replicates of the BFs under each hyperparameter setting were collected. These data are provided in bfs.exp.

A similar, larger experiment was provided with \nu under a Gamma prior with parameters \alpha and \beta \equiv \theta. In this higher dimensional space, a Latin hypercube sample of size eighty was created, and five replicates of BFs were recorded. These data are provided in bfs.gamma.

The examples below involve mleHetTP fits (Chung, et al., 2018) to these data and a visualization of the predictive surfaces thus obtained. The code here follows an example provided, with more detail, in vignette("hetGP")

Note

For code showing how these BFs were calculated, see supplementary material from Franck & Gramacy (2018)

Author(s)

Mickael Binois, mbinois@mcs.anl.gov, and Robert B. Gramacy, rbg@vt.edu

References

Franck CT, Gramacy RB (2018). Assessing Bayes factor surfaces using interactive visualization and computer surrogate modeling. Preprint available on arXiv:1809.05580.

Gramacy RB (2017). monomvn: Estimation for Multivariate Normal and Student-t Data with Monotone Missingness. R package version 1.9-7, https://CRAN.R-project.org/package=monomvn.

R.B. Gramacy and E. Pantaleo (2010). Shrinkage regression for multivariate inference with missing data, and an application to portfolio balancing. Bayesian Analysis 5(2), 237-262. Preprint available on arXiv:0907.2135

Chung M, Binois M, Gramacy RB, Moquin DJ, Smith AP, Smith AM (2018). Parameter and Uncertainty Estimation for Dynamical Systems Using Surrogate Stochastic Processes. SIAM Journal on Scientific Computing, 41(4), 2212-2238. Preprint available on arXiv:1802.00852.

Examples

data(bfs)

##
## Exponential version first
##

thetas <- matrix(bfs.exp$theta, ncol=1)
bfs <- as.matrix(t(bfs.exp[,-1]))

## the data are heavy tailed, so t-errors help
bfs1 <- mleHetTP(X=list(X0=log10(thetas), Z0=colMeans(log(bfs)),
  mult=rep(nrow(bfs), ncol(bfs))), Z=log(as.numeric(bfs)), lower=10^(-4), 
  upper=5, covtype="Matern5_2")

## predictions on a grid in 1d
dx <- seq(0,1,length=100)
dx <- 10^(dx*4 - 3)
p <- predict(bfs1, matrix(log10(dx), ncol=1))

## visualization
matplot(log10(thetas), t(log(bfs)), col=1, pch=21, ylab="log(bf)", 
  main="Bayes factor surface")
lines(log10(dx), p$mean, lwd=2, col=2)
lines(log10(dx), p$mean + 2*sqrt(p$sd2 + p$nugs), col=2, lty=2, lwd=2)
lines(log10(dx), p$mean - 2*sqrt(p$sd2 + p$nugs), col=2, lty=2, lwd=2)
legend("topleft", c("hetTP mean", "hetTP interval"), lwd=2, lty=1:2, col=2)

##
## Now Gamma version
##

D <- as.matrix(bfs.gamma[,1:2])
bfs <- as.matrix(t(bfs.gamma[,-(1:2)]))

## fitting in 2fd
bfs2 <- mleHetTP(X=list(X0=log10(D), Z0=colMeans(log(bfs)), 
  mult=rep(nrow(bfs), ncol(bfs))), Z = log(as.numeric(bfs)), 
  lower = rep(10^(-4), 2), upper = rep(5, 2), covtype = "Matern5_2", 
  maxit=100000)

## predictions on a grid in 2d
dx <- seq(0,1,length=100)
dx <- 10^(dx*4 - 3)
DD <- as.matrix(expand.grid(dx, dx))
p <- predict(bfs2, log10(DD))

## visualization via image-contour plots
par(mfrow=c(1,2))
mbfs <- colMeans(bfs)
image(log10(dx), log10(dx), t(matrix(p$mean, ncol=length(dx))),  
  col=heat.colors(128), xlab="log10 alpha", ylab="log10 beta", 
  main="mean log BF")
text(log10(D[,2]), log10(D[,1]), signif(log(mbfs), 2), cex=0.5)
contour(log10(dx), log10(dx),t(matrix(p$mean, ncol=length(dx))),
  levels=c(-5,-3,-1,0,1,3,5), add=TRUE, col=4)
image(log10(dx), log10(dx), t(matrix(sqrt(p$sd2 + p$nugs), 
  ncol=length(dx))),  col=heat.colors(128), xlab="log10 alpha", 
  ylab="log10 beta", main="sd log BF")
text(log10(D[,2]), log10(D[,1]), signif(apply(log(bfs), 2, sd), 2), 
  cex=0.5)

hetGP documentation built on Sept. 11, 2024, 6:56 p.m.