library("methods") library("rmarkdown") render("manuscript.Rmd")
library("knitr") basename <- "supplement" opts_chunk$set(fig.path = paste("components/figure/", basename, "-", sep=""), cache.path = paste("components/cache/", basename, "/", sep="")) opts_chunk$set(cache = 2) opts_chunk$set(tidy=FALSE, warning=FALSE, message=FALSE, comment = NA, verbose = TRUE, echo=FALSE) opts_chunk$set(dev.opts=c(version="1.7"), dev="pdf")
\tableofcontents
\appendix \renewcommand{\thefigure}{S\arabic{figure}} \renewcommand{\thetable}{S\arabic{table}} \setcounter{figure}{0} \setcounter{table}{0}
All code used in producing this analysis has been embedded into the
manuscript sourcefile using the Dynamic Documentation tool, knitr
for the R language [@Xie_2013], available at
github.com/cboettig/nonparametric-bayes/
To help make the mathematical and computational approaches presented here more accessible, we provide a free and open source (MIT License) R package that implements the GPDP process as it is presented here. Users should note that at this time, the R package has been developed and tested explicitly for this analysis and is not yet intended as a general purpose tool. The manuscript source-code described above illustrates how these functions are used in this analysis. This package can be installed following the directions above.
The code provided should run on any common platform (Windows, Mac or Linux)
that has R and the necessary R packages installed (including support for
the jags Gibbs sampler). The DESCRIPTION file of our R package, nonparametricbayes
,
lists all the software required to use these methods. Additional software requirements
for the other comparisons shown here, such as the Gibbs sampling for the parametric models,
are listed under the Suggested packages list.
Nonetheless, installing the dependencies needed is not a trivial task, and may become more difficult over time as software continues to evolve. To facilitate reuse, we also provide a Dockerfile and Docker image that can be used to replicate and explore the analyses here by providing a copy of the computational environment we have used, with all software installed. Docker software (see docker.com) runs on most platforms as well as cloud servers. Use the command:
docker run -dP cboettig/nonparametric-bayes
to launch an RStudio instance with the necessary software already installed. See the Rocker-org project, github.com/rocker-org for more detailed documentation on using Docker with R.
While the data can be regenerated using the code provided, for convenience
CSV files of the data shown in each graph are made available on Dryad,
along with the source .Rmd
files for the manuscript and supplement
that document them.
Each of our models $f(S_t)$ must be estimated from training data, which
we simulate from the Allen model with parameters $r = $ r p[1]
,
$K =$ r p[2]
, $C =$ r p[3]
, and $\sigma_g =$ r sigma_g
for $T=$ r Tobs
timesteps, starting at initial condition $X_0 = $ r Xo
. The training
data can be seen in Figure 1 and found in the table figure1.csv
.
A further 96 unique randomly generated training data sets are generated for the sensitivity analysis, as described in the main text. The code provided replicates the generation of these sets.
Figure S1 illustrates the performance of the GP and parametric models outside the observed training data. The mean trajectory under the underlying model is shown by the black dots, while the corresponding prediction made by the model shown by the box and whiskers plots. Predictions are based on the true expected value in the previous time step. Predicted distributions that lie entirely above the expected dynamics indicate the expectation of stock sizes higher than what is actually expected. The models differ both in their expectations and their uncertainty (colored bands show two standard deviations away). Note that the GP is particularly uncertain about the dynamics relative to structurally incorrect models like the Ricker.
y <- numeric(8) y[1] <- 4.5 for(t in 1:(length(y)-1)) y[t+1] = z_g() * f(y[t], h=0, p=p) # predicts means, does not reflect uncertainty estimate! crash_data <- step_ahead_posteriors(y) crash_data <- subset(crash_data, variable %in% c("GP", "Allen", "Ricker", "Myers")) ggplot(crash_data) + geom_boxplot(aes(as.factor(as.integer(time)), value, fill = variable, col=variable), outlier.size=1, position="identity") + # geom_line(aes(time, value, col = variable, # group=interaction(L1,variable))) + geom_point(aes(time, stock), size = 3) + scale_fill_manual(values=colorkey[c("GP", "Allen", "Ricker", "Myers")]) + scale_colour_manual(values=colorkey[c("GP", "Allen", "Ricker", "Myers")]) + facet_wrap(~variable) + theme(legend.position="none") + xlab("time") + ylab("stock size") write.csv(crash_data, "components/data/figureS1.csv")
## Forgo the EML generation, the Rmd files provide better documentation at this stage. library(EML) me <- "Carl Boettiger <cboettig@ropensci.org>" models <- levels(crash_data$variable) names(models) <- models col.defs <- c("time", "fish stock", "model", "value", "replicate") unit.defs <- list("number", "number", models, "number", "number") col.classes <- sapply(crash_data, class) eml_write("components/data/figureS1.csv", col.classes = col.classes, col.defs=col.defs,unit.defs=unit.defs,creator=me, title="Figure S1", file="components/data/README_S1.xml")
\newpage
We perform 2 sensitivity analyses. The first focuses on illustrating the robustness of the approach to the two parameters that most influence stochastic transitions across the tipping point: the position of the Allee threshold and the scale of the noise (Fig S2).
Changing the intensity of the stochasticity or the distance between stable and unstable steady states does not impact the performance of the GP relative to the optimal solution obtained from the true model and true parameters. The parametric models are more sensitive to this difference. Large values of $\sigma$ relative to the distance between the stable and unstable point increases the chance of a stochastic transition below the tipping point. More precisely, if we let $L$ be the distance between the stable and unstable steady states, then the probability that fluctuations drive the population across the unstable steady state scales as
$\exp\left(\frac{L^2}{\sigma^2}\right)$
(see @Gardiner2009 or @Mangel2006 for the derivation).
Thus, the impact of using a model that underestimates the risk of harvesting beyond the critical point is considerable, since this such a situation occurs more often. Conversely, with large enough distance between the optimal escapement and unstable steady state relative to $\sigma$, the chance of a transition becomes vanishingly small and all models can be estimated near-optimally. Models that underestimate the cost incurred by population sizes fluctuating significantly below the optimal escapement level will not perform poorly as long as those fluctuations are sufficiently small. Fig S2 shows the net present value of managing under teh GPDP remains close to the optimal value (ratio of 1), despite varying across either noise level or the the position of the allee threshold.
plot_sigmas <- ggplot(vary_sigma, aes(noise, value)) + geom_point(size=2) + ylab("Net present value") + xlab("Level of growth stochasticity") plot_allees <- ggplot(vary_allee, aes(allee, value)) + geom_point(size=2)+ ylab("Net present value") + xlab("Allee threshold stock size") multiplot(plot_sigmas, plot_allees, cols=2) write.csv(vary_sigma, "components/data/figureS2a.csv") write.csv(vary_allee, "components/data/figureS2b.csv")
The Latin hypercube approach systematically varies all combinations of parameters, providing a more general test than varying only one parameter at a time. We loop across eight replicates of three different randomly generated parameter sets for each of two different generating models (Allen and Myers) over two different noise levels (0.01 and 0.05), for a total of 8 x 3 x 2 x 2 = 96 scenarios. The Gaussian Process performs nearly optimally in each case, relative to the optimal solution with no parameter or model uncertainty (Figure S10, appendix).
source("components/sensitivity.R") models <- c("Myers","Allen") parameters <- list( Myers = list( c(r=1.5 + rnorm(1, 0, 1), theta=2 + rnorm(1, 0, 1), K=10 + rnorm(1, 0, 1)), c(r=1.5 + rnorm(1, 0, 1), theta=2.5 + rnorm(1, 0, 1), K=10 + rnorm(1, 0, 1))), Allen = list( c(r=2 + rnorm(1, 0, 1), K=10 + rnorm(1, 0, 1), C=4 + rnorm(1, 0, 1)), c(r=2 + rnorm(1, 0, 1), K=10 + rnorm(1, 0, 1), C=4 + rnorm(1, 0, 1))) ) nuisance_pars <- c("sigma_g") nuisance_values <- list(sigma_g = c(0.01, 0.05))
model <- "Allen" allen1.01 <- sensitivity(model, parameters = parameters[[model]][[1]], nuisance = c(sigma_g = nuisance_values$sigma_g[1]), seed=c(1111, 2222, 3333)) allen2.01 <- sensitivity(model, parameters = parameters[[model]][[2]], nuisance = c(sigma_g = nuisance_values$sigma_g[1]), seed=c(1111, 2222, 3333)) allen1.05 <- sensitivity(model, parameters = parameters[[model]][[1]], nuisance = c(sigma_g = nuisance_values$sigma_g[2]), seed=c(1111, 2222, 3333)) allen2.05 <- sensitivity(model, parameters = parameters[[model]][[2]], nuisance = c(sigma_g = nuisance_values$sigma_g[2]), seed=c(1111, 2222, 3333))
model <- "Myers" Myers1.01 <- sensitivity(model, parameters = parameters[[model]][[1]], nuisance = c(sigma_g = nuisance_values$sigma_g[1]), seed=c(1111, 2222, 3333)) Myers2.01 <- sensitivity(model, parameters = parameters[[model]][[2]], nuisance = c(sigma_g = nuisance_values$sigma_g[1]), seed=c(1111, 2222, 3333)) Myers1.05 <- sensitivity(model, parameters = parameters[[model]][[1]], nuisance = c(sigma_g = nuisance_values$sigma_g[2]), seed=c(1111, 2222, 3333)) Myers2.05 <- sensitivity(model, parameters = parameters[[model]][[2]], nuisance = c(sigma_g = nuisance_values$sigma_g[2]), seed=c(1111, 2222, 3333)) ## Assemble into data.frame allen_dat <- rbind(allen1.01, allen1.05, allen2.01, allen2.05) m <- rbind(Myers1.01, Myers1.05, Myers2.01, Myers2.05) myers_dat <- m[c(1:2,4,3,5:8)] names(myers_dat) <- names(allen_dat) model_dat <- rbind(allen_dat, myers_dat) dat <- model_dat dat$pars.r <- factor(dat$pars.r, labels=c("A", "B", "C", "D")) dat <- dat[c(1:2,5:6, 8, 7)] dat$noise <- factor(dat$noise) names(dat) <- c("model", "parameters", "replicate", "simulation", "noise", "value") ## Extract acutal parameter values corresponding to each parameter set p1 = as.numeric(levels(factor(model_dat$pars.r))) p2 = as.numeric(levels(factor(model_dat$pars.K))) p3 = as.numeric(levels(factor(model_dat$pars.C))) set.A = c(r = p1[1], K = p2[1], theta = p3[1]) set.B = c(r = p1[2], K = p2[2], theta = p3[2]) set.C = c(r = p1[3], K = p2[3], C = p3[3]) set.D = c(r = p1[4], K = p2[4], C = p3[4]) AllenParameterSets <- rbind(set.A, set.B) MyersParameterSets <- rbind(set.C, set.D) sensitivity_dat <- dat
ggplot(sensitivity_dat) + geom_histogram(aes(value, fill=parameters)) + xlim(0,1.0) + theme_bw() + xlab("value as fraction of the optimal") + facet_grid(noise~model) write.csv(sensitivity_dat, "components/data/figureS3.csv")
pandoc.table(AllenParameterSets, caption="Randomly chosen parameter sets for the Allen models in Figure S3." )
pandoc.table(MyersParameterSets, caption="Randomly chosen parameter sets for the Myers models in Figure S3." )
\newpage
This section provides figures and tables showing the traces from each of the MCMC runs used to estimate the parameters of the models presented, along with the resulting posterior distributions for each parameter. Priors usually appear completely flat when shown against the posteriors, but are summarized by tables the parameters of their corresponding distributions for each case.
gp_assessment_plots$traces_plot
gp_assessment_plots$posteriors_plot write.csv(tgp_dat,"components/data/figureS4-5.csv")
\newpage \newpage
ggplot(ricker_posteriors) + geom_line(aes(index, value)) + # priors, need to fix order though facet_wrap(~ variable, scale="free_y", ncol=1)
ggplot(ricker_posteriors, aes(value)) + stat_density(geom="path", position="identity") + facet_wrap(~ variable, scale="free", ncol=2) write.csv(ricker_posteriors, "components/data/figureS6-7.csv")
pander::pandoc.table(ricker_priors_xtable, caption = "Parameterization range for the uniform priors in the Ricker model")
\newpage \newpage
ggplot(myers_posteriors) + geom_line(aes(index, value)) + # priors, need to fix order though facet_wrap(~ variable, scale="free_y", ncol=1)
ggplot(myers_posteriors, aes(value)) + stat_density(geom="path", position="identity") + facet_wrap(~ variable, scale="free", ncol=2) write.csv(myers_posteriors, "components/data/figureS8-9.csv")
pander::pandoc.table(myers_priors_xtable, caption = "Parameterization range for the uniform priors in the Myers model")
\newpage \newpage
ggplot(allen_posteriors) + geom_line(aes(index, value)) + # priors, need to fix order though facet_wrap(~ variable, scale="free_y", ncol=1)
ggplot(allen_posteriors, aes(value)) + stat_density(geom="path", position="identity") + facet_wrap(~ variable, scale="free", ncol=2) write.csv(allen_posteriors, "components/data/figureS10-11.csv")
pander::pandoc.table(allen_priors_xtable, caption = "Parameterization range for the uniform priors in the Allen model")
unlink("ricker_process.bugs") unlink("allen_process.bugs") unlink("myers_process.bugs")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.