In USGS-R/streamMetabolizer: Models for Estimating Aquatic Photosynthesis and Respiration

knitr::opts_chunk$set(echo = TRUE)
options(width=100)

Overview

This vignette shows how to simulate dissolved oxygen 'observations' for the purpose of exploring and testing metabolism models.

Setup

Load streamMetabolizer and some helper packages.

library(streamMetabolizer)
library(dplyr)
library(ggplot2)

Get some data to work with: here we're requesting three days of data at 15-minute resolution. Thanks to Bob Hall for the test data.

dat <- data_metab('3', '15')

Creating a Sim Model

To create a simulation model, you should

Choose a model structure
Choose daily metabolism parameters
Choose the other model specifications
Create the model
Generate predictions (simulations) from the model

1. Choosing the model structure

You can simulate data using any of the GPP and ER functions available to MLE models. Simulations are done by models of type 'sim' but otherwise take very similar arguments to those of an MLE model. Here we'll use a model where ER is a function of temperature.

name_sim_q10 <- mm_name('sim', ER_fun='q10temp')

2. Choosing the daily parameters

To simulate data, you need to specify the daily parameters beforehand. The model structure determines which parameters are needed. There are three good ways to learn which daily parameters you need to specify.

A. Trial and error

To learn about parameter needs by trial and error, simply create the model with the equations you want but without daily inputs, ask for the parameters, and read the error message to get a list of parameters. It's fine to use the defaults for the specifications for now.

mm_sim_q10_trial <- metab(specs(name_sim_q10), dat)
get_params(mm_sim_q10_trial)

Great: we need GPP.daily, ER20, and K600.daily. Now we can pick values for those parameters and put them in a data.frame.

params_sim_q10a <- data.frame(date=as.Date(paste0('2012-09-',18:20)), GPP.daily=2.1, ER20=-5:-3, K600.daily=16)
params_sim_q10a

B. Generate parameters from another model

You can also use fitted parameters from another model as your input for a simulation model. This method could be useful for identifying realistic parameters and/or exploring why a model fitting process didn't work so well.

First fit an MLE model to the same data using the GPP_fun and ER_fun you want. It's fine (again) to use the defaults for the specifications.

mm_mle_q10 <- metab(specs(mm_name('mle', ER_fun='q10temp')), data=dat)

Then ask for the parameters in the right format (without columns for uncertainty or messages).

params_sim_q10b <- get_params(mm_mle_q10, uncertainty='none', messages=FALSE)
params_sim_q10b

C. Look at `?mm_name`

Try it. We put lots of details in the help file. Check out the documentation for the GPP_fun and ER_fun args in particular.

?mm_name

After reading the documentation you'll create a data.frame of the same format as in options A or B.

3. Choosing the specifications

After choosing parameters, the next step is to choose the rest of the specifications. The main difference between sim models and other models is that you can choose values for the probability distributions of the observation and/or process errors. See ?specs for details on the distribution parameters err_obs_sigma, err_obs_phi, err_proc_sigma, and err_proc_phi.

specs_sim_q10 <- specs(name_sim_q10, err_obs_sigma=0, err_proc_sigma=1, K600_daily=NULL, GPP_daily=NULL, ER20=NULL)
specs_sim_q10

4. Creating a model

Now you can create a simulation model much as you would an MLE or Bayesian model. We'll make two models here, one for each of the parameter sets we created above.

mm_sim_q10a <- metab(specs_sim_q10, dat, data_daily=params_sim_q10a)
mm_sim_q10b <- metab(specs_sim_q10, dat, data_daily=params_sim_q10b)

5. Generating predictions

Predictions and simulations are one and the same when your model is of type sim. The output of predict_DO for sim models includes three DO concentration columns. DO.pure is what the DO concentrations would be if the GPP, ER, and K600 parameters exactly described what occurred in the stream. If there's process error in your model, DO.mod will differ from DO.pure in that DO.mod also contains the process error as a fourth driver (on top of GPP, ER, and reaeration) of in-situ DO concentrations. (DO.mod and DO.pure are identical if there's no process error.) Lastly, DO.obs is a simulation of what your sensor might record; it includes everything in DO.mod plus observation error representing inaccuracies in how the sensor reads or records the DO concentration. These three variables are plotted as a muted-color line (DO.pure), a bold dark line (DO.mod), and brightly colored points (DO.obs). DO.pure is mostly hidden behind the others unless the errors are large.

head(predict_DO(mm_sim_q10a))
head(predict_DO(mm_sim_q10b))
plot_DO_preds(mm_sim_q10a, y_var='conc')
plot_DO_preds(mm_sim_q10b, y_var='conc')

Simulating Errors

The main purpose of simulation models is to generate DO 'observations' with error, i.e., noise, to see whether other models can recover the underlying parameters despite the noise.

For this section we'll use a simulation with GPP as a saturating function of light. We'll use method B from above to choose our daily parameters.

specs_sim_sat <- specs(mm_name('sim', GPP_fun='satlight'), err_obs_sigma=0, err_proc_sigma=1, K600_daily=NULL, Pmax=NULL, alpha=NULL, ER_daily=NULL)
params_sim_sat <- get_params(metab(specs(mm_name('mle', GPP_fun='satlight')), data=dat), uncertainty='none', messages=FALSE)

Innovative errors

By default, simulations generate new noise each time you request predictions.

mm_sim_sat_i <- metab(specs_sim_sat, dat, data_daily=params_sim_sat)
plot_DO_preds(mm_sim_sat_i, y_var='conc')
plot_DO_preds(mm_sim_sat_i, y_var='conc')

Fixed errors

Alternatively, you can revise the value of sim_seed to be a number (any number) and then the simulation produces the same noise each time.

mm_sim_sat_f <- metab(revise(specs_sim_sat, sim_seed=47), dat, data_daily=params_sim_sat)
plot_DO_preds(mm_sim_sat_f, y_var='conc')
plot_DO_preds(mm_sim_sat_f, y_var='conc')

Inspecting Models

We'll use a slightly longer dataset here to demonstrate the potential for random noise at the levels of both the observations (every time you run predict_DO()) and the daily parameters (every time you define data_daily).

dat <- data_metab('10', '30')
params <- data.frame(date=as.Date(paste0('2012-09-',18:27)), Pmax=rnorm(10, 6, 2), alpha=rnorm(10, 0.01, 0.001), ER20=rnorm(10, -4, 2), K600.daily=16)
specs <- specs(mm_name('sim', GPP_fun='satlight', ER_fun='q10temp'), err_obs_sigma=0.2, err_proc_sigma=1, K600_daily=NULL, Pmax=NULL, alpha=NULL, ER20=NULL)
mm <- metab(specs, data=dat, data_daily=params)

Sim models print out their parameters with asterisks to denote that the values are fixed rather than fitted.

mm

Sim models produce daily estimates of GPP and ER, which should help in choosing simulation parameters. The GPP and ER predictions have no error bars because they're direct calculations from the daily parameters.

plot_metab_preds(mm)

Multi-Day Simulations

You can also use sim models to simulate variation across many days. Let's start by generating a 60-day timeseries of water temperature, DO.sat, etc. by concatenating 6 copies of 10 days of French Creek data:

dat <- data_metab('10','15')
datlen <- as.numeric(diff(range(dat$solar.time)) + as.difftime(15, units='mins'), units='days')
dat20 <- bind_rows(lapply((0:1)*10, function(add) {
  mutate(dat, solar.time = solar.time + as.difftime(add, units='days'))
}))

You can specify a distribution rather than specific values for GPP, ER, and/or K600 parameters. In fact, this is the default if you don't specify daily data:

sp <- specs(mm_name('sim'))
lapply(unclass(sp)[c('K600_daily','GPP_daily','ER_daily')], function(fun) {
  list(code=attr(fun, 'srcref'), example_vals=fun(n=10))
})

These functions get called to generate new values for K600.daily, GPP.daily, and ER.daily each time you call get_params, predict_metab, or predict_DO. (They'll be the same random values each time if you set sim_seed.)

mm <- metab(sp, dat20, data_daily=NULL)
get_params(mm)[c('date','K600.daily','GPP.daily','ER.daily')]

You can also set err_obs_sigma and other error terms as daily values and/or functions. The defaults are simple numeric values that get replicated to every date, but the values can also be vectors or functions, as with GPP_daily, etc.

sp <- specs('sim', err_obs_sigma=function(n, ...) -0.01*((1:n) - (n/2))^2 + 1, err_proc_sigma=function(n, ...) rnorm(n, 0.1, 0.005), err_proc_phi=seq(0, 1, length.out=20), GPP_daily=3, ER_daily=-4, K600_daily=16)
mm <- metab(sp, dat20)
get_params(mm)
plot_DO_preds(mm)

The above simulation emphasized day-to-day variation in err_obs_sigma. Here's a simulation emphasizing variation in err_proc_sigma and err_proc_phi:

sp <- specs('sim', err_obs_sigma=0.01, err_proc_sigma=function(n, ...) rep(c(0.5, 4), each=10), err_proc_phi=rep(seq(0, 0.8, length.out=10), times=2), GPP_daily=3, ER_daily=-4, K600_daily=16)
mm <- metab(sp, dat20)
get_params(mm)
plot_DO_preds(mm)

The daily parameter functions that you assign in specs() can refer to previous daily parameters in the list. For example, ER_daily can be a function of GPP.daily. Values of GPP.daily may have been specified in the GPP.daily column of data_daily or in the GPP_daily argument to specs(); the ER function should refer to it with its period-separated name, GPP.daily.

sp <- specs('sim', err_obs_sigma=0.01, err_proc_sigma=0.4, K600_daily=16, GPP_daily=function(n, ...) round(rnorm(n, 4, 1), 1), ER_daily=function(GPP.daily, ...) GPP.daily*-2)
mm <- metab(sp, dat20)
get_params(mm)

The K600_daily function can also take advantage of pre-specified model structures relating K to discharge. As of December 2016, the Kb formulation (pool_K600 = 'binned') is the only one available. But it's a good one! See which parameters you can set by calling specs one preliminary time with a Kb model name:

sp <- specs(mm_name('sim', pool_K600='binned', ER_fun='q10temp'), sim_seed=6332)

The new and relevant arguments are K600_lnQ_nodes_centers, K600_lnQ_cnode_meanlog, K600_lnQ_cnode_sdlog, K600_lnQ_nodediffs_meanlog, K600_lnQ_nodediffs_sdlog, and lnK600_lnQ_nodes. The defaults might work just fine for you, and changing lnK600_lnQ_nodes is especially non-recommended. It's probably useful to dial down the noise relating K600.daily to lnK600_lnQ_nodes:

mm <- metab(revise(sp, K600_daily=function(n, K600_daily_predlog, ...) pmax(0, rnorm(n, exp(K600_daily_predlog), 0.4))), dat20)
pars <- get_params(mm)
pars

In this model, even the K~Q relationship is simulated on each call to get_params, predict_metab, or predict_DO. You can inspect the relationship by looking at the K600_eqn attribute to the output of get_params:

attr(pars, 'K600_eqn')

The centers and nodes are the essential pieces of the final piecewise relationship (blue points and line). We can also identify the predictions for specific dates and discharges along that line (purple points) and the K600 params that result from adding noise to those predictions (red points):

KQ <- as.data.frame(attr(pars, 'K600_eqn')[c('K600_lnQ_nodes_centers', 'lnK600_lnQ_nodes')])
Kpred <- mutate(select(pars, date, discharge.daily, K600.daily), K600_daily_predlog=attr(pars, 'K600_eqn')$K600_daily_predlog)
ggplot(KQ, aes(x=K600_lnQ_nodes_centers, y=lnK600_lnQ_nodes)) + geom_line(color='blue') + geom_point(color='blue') + 
  geom_point(data=Kpred, aes(x=log(discharge.daily), y=K600_daily_predlog), color='purple') +
  geom_point(data=Kpred, aes(x=log(discharge.daily), y=log(K600.daily)), color='red')

```

USGS-R/streamMetabolizer documentation built on Aug. 15, 2023, 7:50 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

USGS-R/streamMetabolizer
Models for Estimating Aquatic Photosynthesis and Respiration

In USGS-R/streamMetabolizer: Models for Estimating Aquatic Photosynthesis and Respiration

Overview

Setup

Creating a Sim Model

1. Choosing the model structure

2. Choosing the daily parameters

A. Trial and error

B. Generate parameters from another model

C. Look at `?mm_name`

3. Choosing the specifications

4. Creating a model

5. Generating predictions

Simulating Errors

Innovative errors

Fixed errors

Inspecting Models

Multi-Day Simulations

R Package Documentation

Browse R Packages

We want your feedback!

USGS-R/streamMetabolizer Models for Estimating Aquatic Photosynthesis and Respiration

In USGS-R/streamMetabolizer: Models for Estimating Aquatic Photosynthesis and Respiration

Overview

Setup

Creating a Sim Model

1. Choosing the model structure

2. Choosing the daily parameters

A. Trial and error

B. Generate parameters from another model

C. Look at ?mm_name

3. Choosing the specifications

4. Creating a model

5. Generating predictions

Simulating Errors

Innovative errors

Fixed errors

Inspecting Models

Multi-Day Simulations

R Package Documentation

Browse R Packages

We want your feedback!

USGS-R/streamMetabolizer
Models for Estimating Aquatic Photosynthesis and Respiration

C. Look at `?mm_name`