In cboettig/nonparametric-bayes: Nonparametric Bayes inference for ecological models

Exploring the behavior and ifluence of the prior in the tgp package. It appears there may be a bug (or rather incorrect documentation) in how the mixed Gamma priors are defined for the nugget nug and range / length-scale d.

opts_knit$set(upload.fun = socialR::flickr.url)
opts_chunk$set(dev.args=list(bg="transparent"), comment=NA, tidy=FALSE, message=FALSE)

library(MCMCpack)
library(tgp)
library(reshape2)
library(ggplot2)

Consider some very simple data:

X = seq(-5, 5, length=50)
x = c(-4, -3, -1,  0,  2)
y = c(-2,  0,  1,  2, -1)

Priors

We choose prior parameters that give us very strong priors around particular values.

s2.p <- c(50,50) # InvGamma, approaches Gaussian approaching delta-fn around 1
tau2.p <- c(20,1) # InvGamma, approaches exponential approaching delta-fn around 0
d.p = c(10, 0.01, 10, 0.01) # Mixed Gammas, approaches Gaussian around 0.1, with var .001
nug.p = c(10, 0.1, 10, 0.1) # Mixed Gammas, approaches Gaussian around 1, with var .1

Visualize the priors on lengthscale and nugget:

d_prior <- function(x) dgamma(x, d.p[1], scale = d.p[2]) + dgamma(x, d.p[3], scale = d.p[4])
nug_prior <- function(x) dgamma(x, nug.p[1], scale = nug.p[2]) + dgamma(x, nug.p[3], scale = nug.p[4])
xx <- seq(.0001, 2, length.out=100)
priors <- data.frame(x = xx, nug = nug_prior(xx), d = d_prior(xx))
priors <- melt(priors, id="x")
ggplot(priors) + geom_line(aes(x, value)) + facet_wrap(~variable, scale="free")

Solve the gp with these priors:

gp <- bgp(X=x, XX=X, Z=y, verb=0,
          meanfn="constant", bprior="b0", BTE=c(1,2000,1), m0r1=FALSE, 
          corr="exp", trace=TRUE, beta = 0,
          s2.p = s2.p, d.p = d.p, nug.p = nug.p,
          s2.lam = "fixed", d.lam = "fixed", nug.lam = "fixed", 
          tau2.lam = "fixed", tau2.p = tau2.p)

Extract the posterior Gaussian process mean and the $\pm 2$ standard deviations over the predicted grid from the fit:

V <- gp$ZZ.ks2
tgp_dat <- data.frame(x   = gp$XX[[1]], 
                  y   = gp$ZZ.km, 
                 ymin = gp$ZZ.km - 1.96 * sqrt(gp$ZZ.ks2), 
                 ymax = gp$ZZ.km + 1.96 * sqrt(gp$ZZ.ks2))

For comparison, let us manually estimate the GP with hyperparameters fixed at the mean value of the priors.

d = .1; epsilon = .1; sigma = 1; #fixed hyperparamaters
SE <- function(Xi,Xj, d) sigma * exp(- (Xi - Xj) ^ 2 / d)
cov <- function(X, Y) outer(X, Y, SE, d) 
cov_xx_inv <- solve(cov(x, x) + epsilon * diag(1, length(x)))
Ef <- cov(X, x) %*% cov_xx_inv %*% y
Cf <- cov(X, X) - cov(X, x)  %*% cov_xx_inv %*% cov(x, X)
manual_dat <- data.frame(x = X, 
                         y = Ef, 
                         ymin = (Ef - 1.96 * sqrt(diag(Cf))), 
                         ymax = (Ef + 1.96 * sqrt(diag(Cf))))

Compare the GP posteriors in a plot:

ggplot(tgp_dat) +
    geom_ribbon(aes(x, y, ymin = ymin, ymax = ymax), fill="red", alpha = .1) + # Var
    geom_line(aes(x, y), col="red") + # mean
    geom_point(data = data.frame(x = x, y = y), aes(x, y)) + # raw data
    geom_ribbon(dat = manual_dat, aes(x, y, ymin = ymin, ymax = ymax), fill = "blue", alpha = .1) + # Var
    geom_line(dat = manual_dat, aes(x, y), col = "blue")  + #MEAN    
    theme_bw() + theme(plot.background = element_rect(fill = "transparent",colour = NA))

Wait a minute. The Bayesian estimate of the lengthscale seems way too long, given our strongly informative prior. Likewise the measurement uncertainty appears to have induced a rather broad distribution. Let's look at the posteriors:

require(reshape2)
hyperparameters <- c("index", "s2", "tau2", "beta0", "nug", "d", "ldetK")
posteriors <- melt(gp$trace$XX[[1]][,hyperparameters], id="index")
ggplot(posteriors) + geom_histogram(aes(value)) + facet_wrap(~variable, scales="free")

Hmm... posteriors indeed show a huge value for the length scale d, and for the nugget nug. Looks supiciously like the mixed Gamma prior when the scale parameter is used as the rate parameter (it's inverse):

Compare to priors with `scale` misplaced as `rate` in `d`, `nug`

d_prior <- function(x) dgamma(x, d.p[1], d.p[2]) + dgamma(x, d.p[3], d.p[4])
nug_prior <- function(x) dgamma(x, nug.p[1], nug.p[2]) + dgamma(x, nug.p[3], nug.p[4])
xx <- seq(.0001, 2500, length.out=1000)
priors <- data.frame(x = xx, nug = nug_prior(xx), d = d_prior(xx))
priors <- melt(priors, id="x")
ggplot(priors) + geom_line(aes(x, value)) + facet_wrap(~variable, scale="free")

cboettig/nonparametric-bayes documentation built on May 13, 2019, 2:09 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

cboettig/nonparametric-bayes
Nonparametric Bayes inference for ecological models

In cboettig/nonparametric-bayes: Nonparametric Bayes inference for ecological models

Priors

Compare to priors with `scale` misplaced as `rate` in `d`, `nug`

R Package Documentation

Browse R Packages

We want your feedback!

cboettig/nonparametric-bayes Nonparametric Bayes inference for ecological models

In cboettig/nonparametric-bayes: Nonparametric Bayes inference for ecological models

Priors

Compare to priors with scale misplaced as rate in d, nug

R Package Documentation

Browse R Packages

We want your feedback!

cboettig/nonparametric-bayes
Nonparametric Bayes inference for ecological models

Compare to priors with `scale` misplaced as `rate` in `d`, `nug`