In cboettig/nonparametric-bayes: Nonparametric Bayes inference for ecological models

library("methods")
library("rmarkdown")
render("manuscript.Rmd")

library("knitr")
basename <- "supplement"
opts_chunk$set(fig.path = paste("components/figure/", basename, "-", sep=""),
               cache.path = paste("components/cache/", basename, "/", sep=""))
opts_chunk$set(cache = 2) 
opts_chunk$set(tidy=FALSE, warning=FALSE, message=FALSE, 
               comment = NA, verbose = TRUE, echo=FALSE)

opts_chunk$set(dev.opts=c(version="1.7"), dev="pdf")

\tableofcontents

\appendix \renewcommand{\thefigure}{S\arabic{figure}} \renewcommand{\thetable}{S\arabic{table}} \setcounter{figure}{0} \setcounter{table}{0}

Code

All code used in producing this analysis has been embedded into the manuscript sourcefile using the Dynamic Documentation tool, knitr for the R language [@Xie_2013], available at github.com/cboettig/nonparametric-bayes/

To help make the mathematical and computational approaches presented here more accessible, we provide a free and open source (MIT License) R package that implements the GPDP process as it is presented here. Users should note that at this time, the R package has been developed and tested explicitly for this analysis and is not yet intended as a general purpose tool. The manuscript source-code described above illustrates how these functions are used in this analysis. This package can be installed following the directions above.

Dependencies & Reproducibility

The code provided should run on any common platform (Windows, Mac or Linux) that has R and the necessary R packages installed (including support for the jags Gibbs sampler). The DESCRIPTION file of our R package, nonparametricbayes, lists all the software required to use these methods. Additional software requirements for the other comparisons shown here, such as the Gibbs sampling for the parametric models, are listed under the Suggested packages list.

Nonetheless, installing the dependencies needed is not a trivial task, and may become more difficult over time as software continues to evolve. To facilitate reuse, we also provide a Dockerfile and Docker image that can be used to replicate and explore the analyses here by providing a copy of the computational environment we have used, with all software installed. Docker software (see docker.com) runs on most platforms as well as cloud servers. Use the command:

docker run -dP cboettig/nonparametric-bayes

to launch an RStudio instance with the necessary software already installed. See the Rocker-org project, github.com/rocker-org for more detailed documentation on using Docker with R.

Data

Dryad Data Archive

While the data can be regenerated using the code provided, for convenience CSV files of the data shown in each graph are made available on Dryad, along with the source .Rmd files for the manuscript and supplement that document them.

Training data description

Each of our models $f(S_t)$ must be estimated from training data, which we simulate from the Allen model with parameters $r = $ r p[1], $K =$ r p[2], $C =$ r p[3], and $\sigma_g =$ r sigma_g for $T=$ r Tobs timesteps, starting at initial condition $X_0 = $ r Xo. The training data can be seen in Figure 1 and found in the table figure1.csv.

Training data for sensitivity analyses

A further 96 unique randomly generated training data sets are generated for the sensitivity analysis, as described in the main text. The code provided replicates the generation of these sets.

Model performance outside the predicted range (Fig S1)

Figure S1 illustrates the performance of the GP and parametric models outside the observed training data. The mean trajectory under the underlying model is shown by the black dots, while the corresponding prediction made by the model shown by the box and whiskers plots. Predictions are based on the true expected value in the previous time step. Predicted distributions that lie entirely above the expected dynamics indicate the expectation of stock sizes higher than what is actually expected. The models differ both in their expectations and their uncertainty (colored bands show two standard deviations away). Note that the GP is particularly uncertain about the dynamics relative to structurally incorrect models like the Ricker.

y <- numeric(8)
y[1] <- 4.5
for(t in 1:(length(y)-1))
      y[t+1] = z_g() * f(y[t], h=0, p=p)
# predicts means, does not reflect uncertainty estimate!
crash_data <- step_ahead_posteriors(y)
crash_data <- subset(crash_data, variable %in% c("GP", "Allen", "Ricker", "Myers"))
ggplot(crash_data) + 
  geom_boxplot(aes(as.factor(as.integer(time)), value, 
                   fill = variable, col=variable), 
               outlier.size=1, position="identity") + 
#  geom_line(aes(time, value, col = variable, 
#            group=interaction(L1,variable))) + 
  geom_point(aes(time, stock), size = 3) + 
  scale_fill_manual(values=colorkey[c("GP", "Allen", "Ricker", "Myers")]) +  
  scale_colour_manual(values=colorkey[c("GP", "Allen", "Ricker", "Myers")]) +  
  facet_wrap(~variable) + 
  theme(legend.position="none") + xlab("time") + ylab("stock size") 

write.csv(crash_data, "components/data/figureS1.csv")

## Forgo the EML generation, the Rmd files provide better documentation at this stage.
library(EML)
me <- "Carl Boettiger <cboettig@ropensci.org>"
models <- levels(crash_data$variable)
names(models) <- models
col.defs <- c("time", "fish stock", "model", "value", "replicate")
unit.defs <- list("number", "number", models, "number", "number")
col.classes <- sapply(crash_data, class)
eml_write("components/data/figureS1.csv", col.classes = col.classes, col.defs=col.defs,unit.defs=unit.defs,creator=me, title="Figure S1", file="components/data/README_S1.xml")

\newpage

Further sensitivity analysis (Fig S2 - 3)

We perform 2 sensitivity analyses. The first focuses on illustrating the robustness of the approach to the two parameters that most influence stochastic transitions across the tipping point: the position of the Allee threshold and the scale of the noise (Fig S2).

Changing the intensity of the stochasticity or the distance between stable and unstable steady states does not impact the performance of the GP relative to the optimal solution obtained from the true model and true parameters. The parametric models are more sensitive to this difference. Large values of $\sigma$ relative to the distance between the stable and unstable point increases the chance of a stochastic transition below the tipping point. More precisely, if we let $L$ be the distance between the stable and unstable steady states, then the probability that fluctuations drive the population across the unstable steady state scales as

$\exp\left(\frac{L^2}{\sigma^2}\right)$

(see @Gardiner2009 or @Mangel2006 for the derivation).

Thus, the impact of using a model that underestimates the risk of harvesting beyond the critical point is considerable, since this such a situation occurs more often. Conversely, with large enough distance between the optimal escapement and unstable steady state relative to $\sigma$, the chance of a transition becomes vanishingly small and all models can be estimated near-optimally. Models that underestimate the cost incurred by population sizes fluctuating significantly below the optimal escapement level will not perform poorly as long as those fluctuations are sufficiently small. Fig S2 shows the net present value of managing under teh GPDP remains close to the optimal value (ratio of 1), despite varying across either noise level or the the position of the allee threshold.

plot_sigmas <- ggplot(vary_sigma, aes(noise, value)) + 
  geom_point(size=2) + ylab("Net present value") + 
  xlab("Level of growth stochasticity")
plot_allees <- ggplot(vary_allee, aes(allee, value)) + 
  geom_point(size=2)+ ylab("Net present value") + 
  xlab("Allee threshold stock size")
multiplot(plot_sigmas, plot_allees, cols=2) 
write.csv(vary_sigma, "components/data/figureS2a.csv")
write.csv(vary_allee, "components/data/figureS2b.csv")

The Latin hypercube approach systematically varies all combinations of parameters, providing a more general test than varying only one parameter at a time. We loop across eight replicates of three different randomly generated parameter sets for each of two different generating models (Allen and Myers) over two different noise levels (0.01 and 0.05), for a total of 8 x 3 x 2 x 2 = 96 scenarios. The Gaussian Process performs nearly optimally in each case, relative to the optimal solution with no parameter or model uncertainty (Figure S10, appendix).

source("components/sensitivity.R")

models <- c("Myers","Allen")

parameters <- list(
  Myers = list(
    c(r=1.5 + rnorm(1, 0, 1), theta=2 + rnorm(1, 0, 1), K=10 + rnorm(1, 0, 1)),
    c(r=1.5 + rnorm(1, 0, 1), theta=2.5 + rnorm(1, 0, 1), K=10 + rnorm(1, 0, 1))),
  Allen = list(
    c(r=2 + rnorm(1, 0, 1), K=10 + rnorm(1, 0, 1), C=4 + rnorm(1, 0, 1)),
    c(r=2 + rnorm(1, 0, 1), K=10 + rnorm(1, 0, 1), C=4 + rnorm(1, 0, 1)))
)
nuisance_pars <- c("sigma_g")
nuisance_values <- list(sigma_g = c(0.01, 0.05))

model <- "Allen"
allen1.01 <- sensitivity(model, 
                   parameters = parameters[[model]][[1]], 
                   nuisance = c(sigma_g = nuisance_values$sigma_g[1]), 
                   seed=c(1111, 2222, 3333))
allen2.01 <- sensitivity(model, 
                   parameters = parameters[[model]][[2]], 
                   nuisance = c(sigma_g = nuisance_values$sigma_g[1]), 
                   seed=c(1111, 2222, 3333))
allen1.05 <- sensitivity(model, 
                   parameters = parameters[[model]][[1]], 
                   nuisance = c(sigma_g = nuisance_values$sigma_g[2]), 
                   seed=c(1111, 2222, 3333))
allen2.05 <- sensitivity(model, 
                   parameters = parameters[[model]][[2]], 
                   nuisance = c(sigma_g = nuisance_values$sigma_g[2]), 
                   seed=c(1111, 2222, 3333))

model <- "Myers"
Myers1.01 <- sensitivity(model, 
                   parameters = parameters[[model]][[1]], 
                   nuisance = c(sigma_g = nuisance_values$sigma_g[1]), 
                   seed=c(1111, 2222, 3333))
Myers2.01 <- sensitivity(model, 
                   parameters = parameters[[model]][[2]], 
                   nuisance = c(sigma_g = nuisance_values$sigma_g[1]), 
                   seed=c(1111, 2222, 3333))
Myers1.05 <- sensitivity(model, 
                   parameters = parameters[[model]][[1]], 
                   nuisance = c(sigma_g = nuisance_values$sigma_g[2]), 
                   seed=c(1111, 2222, 3333))
Myers2.05 <- sensitivity(model, 
                   parameters = parameters[[model]][[2]], 
                   nuisance = c(sigma_g = nuisance_values$sigma_g[2]), 
                   seed=c(1111, 2222, 3333))

## Assemble into data.frame
allen_dat <- rbind(allen1.01, allen1.05, allen2.01, allen2.05) 
m <- rbind(Myers1.01, Myers1.05, Myers2.01, Myers2.05)
myers_dat <- m[c(1:2,4,3,5:8)]
names(myers_dat) <- names(allen_dat)
model_dat <- rbind(allen_dat, myers_dat)
dat <- model_dat
dat$pars.r <- factor(dat$pars.r, labels=c("A", "B", "C", "D"))
dat <- dat[c(1:2,5:6, 8, 7)]
dat$noise <- factor(dat$noise)
names(dat) <- c("model", "parameters", "replicate", "simulation", "noise", "value")

## Extract acutal parameter values corresponding to each parameter set
p1 = as.numeric(levels(factor(model_dat$pars.r)))
p2 = as.numeric(levels(factor(model_dat$pars.K)))
p3 = as.numeric(levels(factor(model_dat$pars.C)))

set.A = c(r = p1[1], K = p2[1], theta = p3[1])
set.B = c(r = p1[2], K = p2[2], theta = p3[2])
set.C = c(r = p1[3], K = p2[3], C = p3[3])
set.D = c(r = p1[4], K = p2[4], C = p3[4])
AllenParameterSets <- rbind(set.A, set.B)
MyersParameterSets <- rbind(set.C, set.D)

sensitivity_dat <- dat

ggplot(sensitivity_dat) + 
  geom_histogram(aes(value, fill=parameters)) + 
  xlim(0,1.0) + 
  theme_bw() + 
  xlab("value as fraction of the optimal") + 
  facet_grid(noise~model)
write.csv(sensitivity_dat, "components/data/figureS3.csv")

pandoc.table(AllenParameterSets, caption="Randomly chosen parameter sets for the Allen models in Figure S3." )

pandoc.table(MyersParameterSets, caption="Randomly chosen parameter sets for the Myers models in Figure S3." )

\newpage

MCMC analyses

This section provides figures and tables showing the traces from each of the MCMC runs used to estimate the parameters of the models presented, along with the resulting posterior distributions for each parameter. Priors usually appear completely flat when shown against the posteriors, but are summarized by tables the parameters of their corresponding distributions for each case.

GP MCMC (Fig S4-5)

gp_assessment_plots$traces_plot

gp_assessment_plots$posteriors_plot
write.csv(tgp_dat,"components/data/figureS4-5.csv")

\newpage \newpage

Ricker Model MCMC (Fig S6-7)

ggplot(ricker_posteriors) + 
  geom_line(aes(index, value)) + # priors, need to fix order though
  facet_wrap(~ variable, scale="free_y", ncol=1)

ggplot(ricker_posteriors, aes(value)) + 
  stat_density(geom="path", position="identity") +
  facet_wrap(~ variable, scale="free", ncol=2)
write.csv(ricker_posteriors, "components/data/figureS6-7.csv")

pander::pandoc.table(ricker_priors_xtable,
  caption = "Parameterization range for the uniform priors in the Ricker model")

\newpage \newpage

Myers Model MCMC (Fig S8-9) ##

ggplot(myers_posteriors) + 
  geom_line(aes(index, value)) + # priors, need to fix order though
  facet_wrap(~ variable, scale="free_y", ncol=1)

ggplot(myers_posteriors, aes(value)) + 
  stat_density(geom="path", position="identity") +
  facet_wrap(~ variable, scale="free", ncol=2)
write.csv(myers_posteriors, "components/data/figureS8-9.csv")

pander::pandoc.table(myers_priors_xtable,
           caption = "Parameterization range for the uniform priors in the Myers model")

\newpage \newpage

Allen Model MCMC (Fig S10-11)

ggplot(allen_posteriors) + 
  geom_line(aes(index, value)) + # priors, need to fix order though
  facet_wrap(~ variable, scale="free_y", ncol=1)

ggplot(allen_posteriors, aes(value)) + 
  stat_density(geom="path", position="identity") +
  facet_wrap(~ variable, scale="free", ncol=2)
write.csv(allen_posteriors, "components/data/figureS10-11.csv")

pander::pandoc.table(allen_priors_xtable,
  caption = "Parameterization range for the uniform priors in the Allen model")

unlink("ricker_process.bugs")
unlink("allen_process.bugs")
unlink("myers_process.bugs")

cboettig/nonparametric-bayes documentation built on May 13, 2019, 2:09 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

cboettig/nonparametric-bayes
Nonparametric Bayes inference for ecological models

In cboettig/nonparametric-bayes: Nonparametric Bayes inference for ecological models

Code

Dependencies & Reproducibility

Data

Dryad Data Archive

Training data description

Training data for sensitivity analyses

Model performance outside the predicted range (Fig S1)

Further sensitivity analysis (Fig S2 - 3)

MCMC analyses

GP MCMC (Fig S4-5)

Ricker Model MCMC (Fig S6-7)

Myers Model MCMC (Fig S8-9) ##

Allen Model MCMC (Fig S10-11)

R Package Documentation

Browse R Packages

We want your feedback!

cboettig/nonparametric-bayes Nonparametric Bayes inference for ecological models

In cboettig/nonparametric-bayes: Nonparametric Bayes inference for ecological models

Code

Dependencies & Reproducibility

Data

Dryad Data Archive

Training data description

Training data for sensitivity analyses

Model performance outside the predicted range (Fig S1)

Further sensitivity analysis (Fig S2 - 3)

MCMC analyses

GP MCMC (Fig S4-5)

Ricker Model MCMC (Fig S6-7)

Myers Model MCMC (Fig S8-9) ##

Allen Model MCMC (Fig S10-11)

R Package Documentation

Browse R Packages

We want your feedback!

cboettig/nonparametric-bayes
Nonparametric Bayes inference for ecological models