Exploring the behavior and ifluence of the prior in the tgp
package. It appears there may be a bug (or rather incorrect documentation) in how the mixed Gamma priors are defined for the nugget nug
and range / length-scale d
.
opts_knit$set(upload.fun = socialR::flickr.url) opts_chunk$set(dev.args=list(bg="transparent"), comment=NA, tidy=FALSE, message=FALSE)
library(MCMCpack) library(tgp) library(reshape2) library(ggplot2)
Consider some very simple data:
X = seq(-5, 5, length=50) x = c(-4, -3, -1, 0, 2) y = c(-2, 0, 1, 2, -1)
We choose prior parameters that give us very strong priors around particular values.
s2.p <- c(50,50) # InvGamma, approaches Gaussian approaching delta-fn around 1 tau2.p <- c(20,1) # InvGamma, approaches exponential approaching delta-fn around 0 d.p = c(10, 0.01, 10, 0.01) # Mixed Gammas, approaches Gaussian around 0.1, with var .001 nug.p = c(10, 0.1, 10, 0.1) # Mixed Gammas, approaches Gaussian around 1, with var .1
Visualize the priors on lengthscale and nugget:
d_prior <- function(x) dgamma(x, d.p[1], scale = d.p[2]) + dgamma(x, d.p[3], scale = d.p[4]) nug_prior <- function(x) dgamma(x, nug.p[1], scale = nug.p[2]) + dgamma(x, nug.p[3], scale = nug.p[4]) xx <- seq(.0001, 2, length.out=100) priors <- data.frame(x = xx, nug = nug_prior(xx), d = d_prior(xx)) priors <- melt(priors, id="x") ggplot(priors) + geom_line(aes(x, value)) + facet_wrap(~variable, scale="free")
Solve the gp with these priors:
gp <- bgp(X=x, XX=X, Z=y, verb=0, meanfn="constant", bprior="b0", BTE=c(1,2000,1), m0r1=FALSE, corr="exp", trace=TRUE, beta = 0, s2.p = s2.p, d.p = d.p, nug.p = nug.p, s2.lam = "fixed", d.lam = "fixed", nug.lam = "fixed", tau2.lam = "fixed", tau2.p = tau2.p)
Extract the posterior Gaussian process mean and the $\pm 2$ standard deviations over the predicted grid from the fit:
V <- gp$ZZ.ks2 tgp_dat <- data.frame(x = gp$XX[[1]], y = gp$ZZ.km, ymin = gp$ZZ.km - 1.96 * sqrt(gp$ZZ.ks2), ymax = gp$ZZ.km + 1.96 * sqrt(gp$ZZ.ks2))
For comparison, let us manually estimate the GP with hyperparameters fixed at the mean value of the priors.
d = .1; epsilon = .1; sigma = 1; #fixed hyperparamaters SE <- function(Xi,Xj, d) sigma * exp(- (Xi - Xj) ^ 2 / d) cov <- function(X, Y) outer(X, Y, SE, d) cov_xx_inv <- solve(cov(x, x) + epsilon * diag(1, length(x))) Ef <- cov(X, x) %*% cov_xx_inv %*% y Cf <- cov(X, X) - cov(X, x) %*% cov_xx_inv %*% cov(x, X) manual_dat <- data.frame(x = X, y = Ef, ymin = (Ef - 1.96 * sqrt(diag(Cf))), ymax = (Ef + 1.96 * sqrt(diag(Cf))))
Compare the GP posteriors in a plot:
ggplot(tgp_dat) + geom_ribbon(aes(x, y, ymin = ymin, ymax = ymax), fill="red", alpha = .1) + # Var geom_line(aes(x, y), col="red") + # mean geom_point(data = data.frame(x = x, y = y), aes(x, y)) + # raw data geom_ribbon(dat = manual_dat, aes(x, y, ymin = ymin, ymax = ymax), fill = "blue", alpha = .1) + # Var geom_line(dat = manual_dat, aes(x, y), col = "blue") + #MEAN theme_bw() + theme(plot.background = element_rect(fill = "transparent",colour = NA))
Wait a minute. The Bayesian estimate of the lengthscale seems way too long, given our strongly informative prior. Likewise the measurement uncertainty appears to have induced a rather broad distribution. Let's look at the posteriors:
require(reshape2) hyperparameters <- c("index", "s2", "tau2", "beta0", "nug", "d", "ldetK") posteriors <- melt(gp$trace$XX[[1]][,hyperparameters], id="index") ggplot(posteriors) + geom_histogram(aes(value)) + facet_wrap(~variable, scales="free")
Hmm... posteriors indeed show a huge value for the length scale d
, and for the nugget nug
. Looks supiciously like the mixed Gamma prior when the scale parameter is used as the rate parameter (it's inverse):
scale
misplaced as rate
in d
, nug
d_prior <- function(x) dgamma(x, d.p[1], d.p[2]) + dgamma(x, d.p[3], d.p[4]) nug_prior <- function(x) dgamma(x, nug.p[1], nug.p[2]) + dgamma(x, nug.p[3], nug.p[4]) xx <- seq(.0001, 2500, length.out=1000) priors <- data.frame(x = xx, nug = nug_prior(xx), d = d_prior(xx)) priors <- melt(priors, id="x") ggplot(priors) + geom_line(aes(x, value)) + facet_wrap(~variable, scale="free")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.