Exploring the behavior and ifluence of the prior in the tgp package. It appears there may be a bug (or rather incorrect documentation) in how the mixed Gamma priors are defined for the nugget nug and range / length-scale d.
library(MCMCpack)
library(tgp)
library(reshape2)
library(ggplot2)
Consider some very simple data:
X = seq(-5, 5, length=50)
x = c(-4, -3, -1, 0, 2)
y = c(-2, 0, 1, 2, -1)
We choose prior parameters that give us very strong priors around particular values.
s2.p <- c(50,50) # InvGamma, approaches Gaussian approaching delta-fn around 1
tau2.p <- c(20,1) # InvGamma, approaches exponential approaching delta-fn around 0
d.p = c(10, 0.01, 10, 0.01) # Mixed Gammas, approaches Gaussian around 0.1, with var .001
nug.p = c(10, 0.1, 10, 0.1) # Mixed Gammas, approaches Gaussian around 1, with var .1
Visualize the priors on lengthscale and nugget:
d_prior <- function(x) dgamma(x, d.p[1], scale = d.p[2]) + dgamma(x, d.p[3], scale = d.p[4])
nug_prior <- function(x) dgamma(x, nug.p[1], scale = nug.p[2]) + dgamma(x, nug.p[3], scale = nug.p[4])
xx <- seq(.0001, 2, length.out=100)
priors <- data.frame(x = xx, nug = nug_prior(xx), d = d_prior(xx))
priors <- melt(priors, id="x")
ggplot(priors) + geom_line(aes(x, value)) + facet_wrap(~variable, scale="free")
Solve the gp with these priors:
gp <- bgp(X=x, XX=X, Z=y, verb=0,
meanfn="constant", bprior="b0", BTE=c(1,2000,1), m0r1=FALSE,
corr="exp", trace=TRUE, beta = 0,
s2.p = s2.p, d.p = d.p, nug.p = nug.p,
s2.lam = "fixed", d.lam = "fixed", nug.lam = "fixed",
tau2.lam = "fixed", tau2.p = tau2.p)
Extract the posterior Gaussian process mean and the $\pm 2$ standard deviations over the predicted grid from the fit:
V <- gp$ZZ.ks2
tgp_dat <- data.frame(x = gp$XX[[1]],
y = gp$ZZ.km,
ymin = gp$ZZ.km - 1.96 * sqrt(gp$ZZ.ks2),
ymax = gp$ZZ.km + 1.96 * sqrt(gp$ZZ.ks2))
For comparison, let us manually estimate the GP with hyperparameters fixed at the mean value of the priors.
d = .1; epsilon = .1; sigma = 1; #fixed hyperparamaters
SE <- function(Xi,Xj, d) sigma * exp(- (Xi - Xj) ^ 2 / d)
cov <- function(X, Y) outer(X, Y, SE, d)
cov_xx_inv <- solve(cov(x, x) + epsilon * diag(1, length(x)))
Ef <- cov(X, x) %*% cov_xx_inv %*% y
Cf <- cov(X, X) - cov(X, x) %*% cov_xx_inv %*% cov(x, X)
manual_dat <- data.frame(x = X,
y = Ef,
ymin = (Ef - 1.96 * sqrt(diag(Cf))),
ymax = (Ef + 1.96 * sqrt(diag(Cf))))
Compare the GP posteriors in a plot:
ggplot(tgp_dat) +
geom_ribbon(aes(x, y, ymin = ymin, ymax = ymax), fill="red", alpha = .1) + # Var
geom_line(aes(x, y), col="red") + # mean
geom_point(data = data.frame(x = x, y = y), aes(x, y)) + # raw data
geom_ribbon(dat = manual_dat, aes(x, y, ymin = ymin, ymax = ymax), fill = "blue", alpha = .1) + # Var
geom_line(dat = manual_dat, aes(x, y), col = "blue") + #MEAN
theme_bw() + theme(plot.background = element_rect(fill = "transparent",colour = NA))
Wait a minute. The Bayesian estimate of the lengthscale seems way too long, given our strongly informative prior. Likewise the measurement uncertainty appears to have induced a rather broad distribution. Let's look at the posteriors:
require(reshape2)
hyperparameters <- c("index", "s2", "tau2", "beta0", "nug", "d", "ldetK")
posteriors <- melt(gp$trace$XX[[1]][,hyperparameters], id="index")
ggplot(posteriors) + geom_histogram(aes(value)) + facet_wrap(~variable, scales="free")
Hmm... posteriors indeed show a huge value for the length scale d, and for the nugget nug. Looks supiciously like the mixed Gamma prior when the scale parameter is used as the rate parameter (it's inverse):
scale misplaced as rate in d, nugd_prior <- function(x) dgamma(x, d.p[1], d.p[2]) + dgamma(x, d.p[3], d.p[4])
nug_prior <- function(x) dgamma(x, nug.p[1], nug.p[2]) + dgamma(x, nug.p[3], nug.p[4])
xx <- seq(.0001, 2500, length.out=1000)
priors <- data.frame(x = xx, nug = nug_prior(xx), d = d_prior(xx))
priors <- melt(priors, id="x")
ggplot(priors) + geom_line(aes(x, value)) + facet_wrap(~variable, scale="free")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.