Exploring the behavior and ifluence of the prior in the tgp
package. It appears there may be a bug (or rather incorrect documentation) in how the mixed Gamma priors are defined for the nugget nug
and range / length-scale d
.
library(MCMCpack)
library(tgp)
library(reshape2)
library(ggplot2)
Consider some very simple data:
X = seq(-5, 5, length=50)
x = c(-4, -3, -1, 0, 2)
y = c(-2, 0, 1, 2, -1)
We choose prior parameters that give us very strong priors around particular values.
s2.p <- c(50,50) # InvGamma, approaches Gaussian approaching delta-fn around 1
tau2.p <- c(20,1) # InvGamma, approaches exponential approaching delta-fn around 0
d.p = c(10, 0.01, 10, 0.01) # Mixed Gammas, approaches Gaussian around 0.1, with var .001
nug.p = c(10, 0.1, 10, 0.1) # Mixed Gammas, approaches Gaussian around 1, with var .1
Visualize the priors on lengthscale and nugget:
d_prior <- function(x) dgamma(x, d.p[1], scale = d.p[2]) + dgamma(x, d.p[3], scale = d.p[4])
nug_prior <- function(x) dgamma(x, nug.p[1], scale = nug.p[2]) + dgamma(x, nug.p[3], scale = nug.p[4])
xx <- seq(.0001, 2, length.out=100)
priors <- data.frame(x = xx, nug = nug_prior(xx), d = d_prior(xx))
priors <- melt(priors, id="x")
ggplot(priors) + geom_line(aes(x, value)) + facet_wrap(~variable, scale="free")
Solve the gp with these priors:
gp <- bgp(X=x, XX=X, Z=y, verb=0,
meanfn="constant", bprior="b0", BTE=c(1,2000,1), m0r1=FALSE,
corr="exp", trace=TRUE, beta = 0,
s2.p = s2.p, d.p = d.p, nug.p = nug.p,
s2.lam = "fixed", d.lam = "fixed", nug.lam = "fixed",
tau2.lam = "fixed", tau2.p = tau2.p)
Extract the posterior Gaussian process mean and the $\pm 2$ standard deviations over the predicted grid from the fit:
V <- gp$ZZ.ks2
tgp_dat <- data.frame(x = gp$XX[[1]],
y = gp$ZZ.km,
ymin = gp$ZZ.km - 1.96 * sqrt(gp$ZZ.ks2),
ymax = gp$ZZ.km + 1.96 * sqrt(gp$ZZ.ks2))
For comparison, let us manually estimate the GP with hyperparameters fixed at the mean value of the priors.
d = .1; epsilon = .1; sigma = 1; #fixed hyperparamaters
SE <- function(Xi,Xj, d) sigma * exp(- (Xi - Xj) ^ 2 / d)
cov <- function(X, Y) outer(X, Y, SE, d)
cov_xx_inv <- solve(cov(x, x) + epsilon * diag(1, length(x)))
Ef <- cov(X, x) %*% cov_xx_inv %*% y
Cf <- cov(X, X) - cov(X, x) %*% cov_xx_inv %*% cov(x, X)
manual_dat <- data.frame(x = X,
y = Ef,
ymin = (Ef - 1.96 * sqrt(diag(Cf))),
ymax = (Ef + 1.96 * sqrt(diag(Cf))))
Compare the GP posteriors in a plot:
ggplot(tgp_dat) +
geom_ribbon(aes(x, y, ymin = ymin, ymax = ymax), fill="red", alpha = .1) + # Var
geom_line(aes(x, y), col="red") + # mean
geom_point(data = data.frame(x = x, y = y), aes(x, y)) + # raw data
geom_ribbon(dat = manual_dat, aes(x, y, ymin = ymin, ymax = ymax), fill = "blue", alpha = .1) + # Var
geom_line(dat = manual_dat, aes(x, y), col = "blue") + #MEAN
theme_bw() + theme(plot.background = element_rect(fill = "transparent",colour = NA))
Wait a minute. The Bayesian estimate of the lengthscale seems way too long, given our strongly informative prior. Likewise the measurement uncertainty appears to have induced a rather broad distribution. Let's look at the posteriors:
require(reshape2)
hyperparameters <- c("index", "s2", "tau2", "beta0", "nug", "d", "ldetK")
posteriors <- melt(gp$trace$XX[[1]][,hyperparameters], id="index")
ggplot(posteriors) + geom_histogram(aes(value)) + facet_wrap(~variable, scales="free")
Hmm... posteriors indeed show a huge value for the length scale d
, and for the nugget nug
. Looks supiciously like the mixed Gamma prior when the scale parameter is used as the rate parameter (it's inverse):
scale
misplaced as rate
in d
, nug
d_prior <- function(x) dgamma(x, d.p[1], d.p[2]) + dgamma(x, d.p[3], d.p[4])
nug_prior <- function(x) dgamma(x, nug.p[1], nug.p[2]) + dgamma(x, nug.p[3], nug.p[4])
xx <- seq(.0001, 2500, length.out=1000)
priors <- data.frame(x = xx, nug = nug_prior(xx), d = d_prior(xx))
priors <- melt(priors, id="x")
ggplot(priors) + geom_line(aes(x, value)) + facet_wrap(~variable, scale="free")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.