inst/examples/appendices/knit_sources/simulation_example_knit_.md

Model-based analysis of warning signals

We load the package and the data set from the individual-based simulations. To this data we fit both the LSN and OU models,

Comparing the models by the bootstrapping procedure is intensive. compare simulates under each model, and then estimates both models on both simulated sets of data (for a total of 4 model fits). As only the LSN fit is computationally intensive, this takes about twice the time as the above fitting step of model B. We will need to repeat this many times to build up a bootstrap distribution. Consequently, it is convient to run this step in parallel over multiple processors or clusters of computers.

Parallelization of the comparison

Fortunately R has a diverse array of parallel tools, Schmidberger et al., 2009) give a nice review. Here we set up the flexible snowfall parallel envirnoment, which can run in serial, on a multicore chip, or over a cluster of machines.

Rather than build this choice into the earlywarning signals package, the user can choose whatever looping implementation they desire to compute replicates of the comparison. Our results are cached in the package, so anyone seeking to explore this example can run data(example_analysis) instead. The actual analysis takes just one line:

Visualizing the distributions

Once this step is finished, the remaining analysis is just a matter of shuffling and plotting the data. We extract the likelihood ratios (deviances) from the null and test simulations with a simple function.

We can visualize this data as overlapping distributions

Or as beanplot

Generating the ROC Curve

We can make the ROC curve by calculating the true and false positive rates from the overlap of these distributions using roc_data. This gives us a data frame of threshold values for our statistic, and the corresponding false positive rate and true positive rate. We can simply plot those rates against each other to create the ROC curve.

Parameter distributions

We can also look at the bootstraps of the parameters. Another helper function will reformat this data from reps list. The fit column uses a two-letter code to indicate first what model was used to simulate the data, and then what model was fit to the data. For instance, AB means the data was simulated from model A (the null, OU model) but fit to B.

There are lots of options for visualizing this relatively high-dimensional data, which we can easily explore with a few commands from the ggplot2 package. For instance, we can look at average and range of parameters estimated in the bootstraps of each model:

Visualizing the simulation replicates

The reps list also contains the simulation data from each replicate

Summary

Repeating the analysis for other data sets used in the study. The analysis described above is essentially six steps, each corresponding to a single line of code.

  1. fit the null model,
  2. fit the test model
  3. check the observed likelihood ratio,
  4. bootstrap,
  5. get likelihood ratio distributions from the bootstraps
  6. convert the distribution into error rates.

For instance, we can run these six steps on the chemostat data with:

We highlighted two addtional functions format the parameter data and simulation results so that they can plotted or explored further.



cboettig/earlywarning documentation built on May 13, 2019, 2:07 p.m.