knitr::opts_chunk$set(echo = TRUE) options(width=100)
This vignette shows how to simulate dissolved oxygen 'observations' for the purpose of exploring and testing metabolism models.
Load streamMetabolizer and some helper packages.
library(streamMetabolizer) library(dplyr) library(ggplot2)
Get some data to work with: here we're requesting three days of data at 15-minute resolution. Thanks to Bob Hall for the test data.
dat <- data_metab('3', '15')
To create a simulation model, you should
You can simulate data using any of the GPP and ER functions available to MLE
models. Simulations are done by models of type 'sim'
but otherwise take very
similar arguments to those of an MLE model. Here we'll use a model where ER is a
function of temperature.
name_sim_q10 <- mm_name('sim', ER_fun='q10temp')
To simulate data, you need to specify the daily parameters beforehand. The model structure determines which parameters are needed. There are three good ways to learn which daily parameters you need to specify.
To learn about parameter needs by trial and error, simply create the model with the equations you want but without daily inputs, ask for the parameters, and read the error message to get a list of parameters. It's fine to use the defaults for the specifications for now.
mm_sim_q10_trial <- metab(specs(name_sim_q10), dat) get_params(mm_sim_q10_trial)
Great: we need GPP.daily
, ER20
, and K600.daily
. Now we can pick values for
those parameters and put them in a data.frame.
params_sim_q10a <- data.frame(date=as.Date(paste0('2012-09-',18:20)), GPP.daily=2.1, ER20=-5:-3, K600.daily=16) params_sim_q10a
You can also use fitted parameters from another model as your input for a simulation model. This method could be useful for identifying realistic parameters and/or exploring why a model fitting process didn't work so well.
First fit an MLE model to the same data using the GPP_fun
and ER_fun
you
want. It's fine (again) to use the defaults for the specifications.
mm_mle_q10 <- metab(specs(mm_name('mle', ER_fun='q10temp')), data=dat)
Then ask for the parameters in the right format (without columns for uncertainty or messages).
params_sim_q10b <- get_params(mm_mle_q10, uncertainty='none', messages=FALSE) params_sim_q10b
?mm_name
Try it. We put lots of details in the help file. Check out the documentation for
the GPP_fun
and ER_fun
args in particular.
?mm_name
After reading the documentation you'll create a data.frame of the same format as in options A or B.
After choosing parameters, the next step is to choose the rest of the
specifications. The main difference between sim models and other models is that
you can choose values for the probability distributions of the observation
and/or process errors. See ?specs
for details on the distribution parameters
err_obs_sigma
, err_obs_phi
, err_proc_sigma
, and err_proc_phi
.
specs_sim_q10 <- specs(name_sim_q10, err_obs_sigma=0, err_proc_sigma=1, K600_daily=NULL, GPP_daily=NULL, ER20=NULL) specs_sim_q10
Now you can create a simulation model much as you would an MLE or Bayesian model. We'll make two models here, one for each of the parameter sets we created above.
mm_sim_q10a <- metab(specs_sim_q10, dat, data_daily=params_sim_q10a) mm_sim_q10b <- metab(specs_sim_q10, dat, data_daily=params_sim_q10b)
Predictions and simulations are one and the same when your model is of type sim
. The output of predict_DO
for sim
models includes three DO concentration columns. DO.pure
is what the DO concentrations would be if the GPP, ER, and K600 parameters exactly described what occurred in the stream. If there's process error in your model, DO.mod
will differ from DO.pure
in that DO.mod
also contains the process error as a fourth driver (on top of GPP, ER, and reaeration) of in-situ DO concentrations. (DO.mod
and DO.pure
are identical if there's no process error.) Lastly, DO.obs
is a simulation of what your sensor might record; it includes everything in DO.mod
plus observation error representing inaccuracies in how the sensor reads or records the DO concentration. These three variables are plotted as a muted-color line (DO.pure
), a bold dark line (DO.mod
), and brightly colored points (DO.obs
). DO.pure
is mostly hidden behind the others unless the errors are large.
head(predict_DO(mm_sim_q10a)) head(predict_DO(mm_sim_q10b)) plot_DO_preds(mm_sim_q10a, y_var='conc') plot_DO_preds(mm_sim_q10b, y_var='conc')
The main purpose of simulation models is to generate DO 'observations' with error, i.e., noise, to see whether other models can recover the underlying parameters despite the noise.
For this section we'll use a simulation with GPP as a saturating function of light. We'll use method B from above to choose our daily parameters.
specs_sim_sat <- specs(mm_name('sim', GPP_fun='satlight'), err_obs_sigma=0, err_proc_sigma=1, K600_daily=NULL, Pmax=NULL, alpha=NULL, ER_daily=NULL) params_sim_sat <- get_params(metab(specs(mm_name('mle', GPP_fun='satlight')), data=dat), uncertainty='none', messages=FALSE)
By default, simulations generate new noise each time you request predictions.
mm_sim_sat_i <- metab(specs_sim_sat, dat, data_daily=params_sim_sat) plot_DO_preds(mm_sim_sat_i, y_var='conc') plot_DO_preds(mm_sim_sat_i, y_var='conc')
Alternatively, you can revise the value of sim_seed
to be a number (any
number) and then the simulation produces the same noise each time.
mm_sim_sat_f <- metab(revise(specs_sim_sat, sim_seed=47), dat, data_daily=params_sim_sat) plot_DO_preds(mm_sim_sat_f, y_var='conc') plot_DO_preds(mm_sim_sat_f, y_var='conc')
We'll use a slightly longer dataset here to demonstrate the potential for random
noise at the levels of both the observations (every time you run predict_DO()
)
and the daily parameters (every time you define data_daily
).
dat <- data_metab('10', '30') params <- data.frame(date=as.Date(paste0('2012-09-',18:27)), Pmax=rnorm(10, 6, 2), alpha=rnorm(10, 0.01, 0.001), ER20=rnorm(10, -4, 2), K600.daily=16) specs <- specs(mm_name('sim', GPP_fun='satlight', ER_fun='q10temp'), err_obs_sigma=0.2, err_proc_sigma=1, K600_daily=NULL, Pmax=NULL, alpha=NULL, ER20=NULL) mm <- metab(specs, data=dat, data_daily=params)
Sim models print out their parameters with asterisks to denote that the values are fixed rather than fitted.
mm
Sim models produce daily estimates of GPP and ER, which should help in choosing simulation parameters. The GPP and ER predictions have no error bars because they're direct calculations from the daily parameters.
plot_metab_preds(mm)
You can also use sim
models to simulate variation across many days. Let's start by generating a 60-day timeseries of water temperature, DO.sat, etc. by concatenating 6 copies of 10 days of French Creek data:
dat <- data_metab('10','15') datlen <- as.numeric(diff(range(dat$solar.time)) + as.difftime(15, units='mins'), units='days') dat20 <- bind_rows(lapply((0:1)*10, function(add) { mutate(dat, solar.time = solar.time + as.difftime(add, units='days')) }))
You can specify a distribution rather than specific values for GPP, ER, and/or K600 parameters. In fact, this is the default if you don't specify daily data:
sp <- specs(mm_name('sim')) lapply(unclass(sp)[c('K600_daily','GPP_daily','ER_daily')], function(fun) { list(code=attr(fun, 'srcref'), example_vals=fun(n=10)) })
These functions get called to generate new values for K600.daily, GPP.daily, and ER.daily each time you call get_params
, predict_metab
, or predict_DO
. (They'll be the same random values each time if you set sim_seed
.)
mm <- metab(sp, dat20, data_daily=NULL) get_params(mm)[c('date','K600.daily','GPP.daily','ER.daily')]
You can also set err_obs_sigma
and other error terms as daily values and/or functions. The defaults are simple numeric values that get replicated to every date, but the values can also be vectors or functions, as with GPP_daily
, etc.
sp <- specs('sim', err_obs_sigma=function(n, ...) -0.01*((1:n) - (n/2))^2 + 1, err_proc_sigma=function(n, ...) rnorm(n, 0.1, 0.005), err_proc_phi=seq(0, 1, length.out=20), GPP_daily=3, ER_daily=-4, K600_daily=16) mm <- metab(sp, dat20) get_params(mm) plot_DO_preds(mm)
The above simulation emphasized day-to-day variation in err_obs_sigma
. Here's a simulation emphasizing variation in err_proc_sigma
and err_proc_phi
:
sp <- specs('sim', err_obs_sigma=0.01, err_proc_sigma=function(n, ...) rep(c(0.5, 4), each=10), err_proc_phi=rep(seq(0, 0.8, length.out=10), times=2), GPP_daily=3, ER_daily=-4, K600_daily=16) mm <- metab(sp, dat20) get_params(mm) plot_DO_preds(mm)
The daily parameter functions that you assign in specs()
can refer to previous daily parameters in the list. For example, ER_daily can be a function of GPP.daily
. Values of GPP.daily
may have been specified in the GPP.daily
column of data_daily
or in the GPP_daily
argument to specs()
; the ER function should refer to it with its period-separated name, GPP.daily
.
sp <- specs('sim', err_obs_sigma=0.01, err_proc_sigma=0.4, K600_daily=16, GPP_daily=function(n, ...) round(rnorm(n, 4, 1), 1), ER_daily=function(GPP.daily, ...) GPP.daily*-2) mm <- metab(sp, dat20) get_params(mm)
The K600_daily function can also take advantage of pre-specified model structures relating K to discharge. As of December 2016, the Kb
formulation (pool_K600 = 'binned'
) is the only one available. But it's a good one! See which parameters you can set by calling specs
one preliminary time with a Kb model name:
sp <- specs(mm_name('sim', pool_K600='binned', ER_fun='q10temp'), sim_seed=6332)
The new and relevant arguments are K600_lnQ_nodes_centers
, K600_lnQ_cnode_meanlog
, K600_lnQ_cnode_sdlog
, K600_lnQ_nodediffs_meanlog
, K600_lnQ_nodediffs_sdlog
, and lnK600_lnQ_nodes
. The defaults might work just fine for you, and changing lnK600_lnQ_nodes
is especially non-recommended. It's probably useful to dial down the noise relating K600.daily to lnK600_lnQ_nodes:
mm <- metab(revise(sp, K600_daily=function(n, K600_daily_predlog, ...) pmax(0, rnorm(n, exp(K600_daily_predlog), 0.4))), dat20) pars <- get_params(mm) pars
In this model, even the K~Q relationship is simulated on each call to get_params
, predict_metab
, or predict_DO
. You can inspect the relationship by looking at the K600_eqn
attribute to the output of get_params
:
attr(pars, 'K600_eqn')
The centers and nodes are the essential pieces of the final piecewise relationship (blue points and line). We can also identify the predictions for specific dates and discharges along that line (purple points) and the K600 params that result from adding noise to those predictions (red points):
KQ <- as.data.frame(attr(pars, 'K600_eqn')[c('K600_lnQ_nodes_centers', 'lnK600_lnQ_nodes')]) Kpred <- mutate(select(pars, date, discharge.daily, K600.daily), K600_daily_predlog=attr(pars, 'K600_eqn')$K600_daily_predlog) ggplot(KQ, aes(x=K600_lnQ_nodes_centers, y=lnK600_lnQ_nodes)) + geom_line(color='blue') + geom_point(color='blue') + geom_point(data=Kpred, aes(x=log(discharge.daily), y=K600_daily_predlog), color='purple') + geom_point(data=Kpred, aes(x=log(discharge.daily), y=log(K600.daily)), color='red')
```
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.