In TESA-workshops/TESAsamSim: Closed-loop salmon simulator for 2021 TESA workshop

This document illustrates how to simulate some stock-recruitment data using the TESAsamSim R package and then fit a linear state-space model (aka Dynamic Linear model) with potentially time-varying parameters to it. The model is fit via Maximum likelihood followed by Kalman filtering and smoothing using a wrapper for the dlm R package (see here). The wrapper is based in part on code generously provided by Cody Szuwalski (NOAA). For this illustrative example the simulations are based on the population characteristics of Interior Fraser River coho salmon.

First install necessary packages and load Interior Fraser Coho data used to condition the simulations.

## Check if necessary packages are available and install if necessary
listOfPackages <- c("here", "parallel", "doParallel", "foreach", 
                    "tidyverse", "tictoc", "TESAsamSim")
newPackages <- listOfPackages[!(listOfPackages %in% 
                                  installed.packages()[ , "Package"])]
if (length(newPackages)) {
  install.packages(newPackages)
}
lapply(listOfPackages, require, character.only = TRUE)

# load wrapper function to fit dynamic linear models
source(here("dlm-wrapper.R"))

## Load relevant input data
# CU-specific parameters
cuPar <- read.csv(here("data", "IFCohoPars", "cohoCUpars.csv"),
                  stringsAsFactors=F)

# Stock-recruit and catch data that are used to populate the simulation priming/conditioning period
srDat <- read.csv(here("data", "IFCohoPars", "cohoRecDatTrim.csv"),
                  stringsAsFactors=F)

Next, simulate some stock-recruitment data based on the parameters specified in cuPar and the scenarios specified in simPar which include whether population productivity or capacity are stationary or time-varying. To do this use the genericRecoverySim() function. For our illustrative purposes we only need one Monte Carlo trial, but the simulator needs to run at least two, and we'll do this separately for each scenario (row) in simPar which are different combinations of stable or declining productivity and capacity.

# Simulation run parameters describing different scenarios
simPar <- read.csv(here("data", "IFCohoPars",
                         "cohoSimPar.csv"),
                       stringsAsFactors = F)

# Directory where simulated data will be saved
scenNames <- unique(simPar$scenario)
dirNames <- sapply(scenNames, function(x) paste(x, unique(simPar$species),
                                                sep = "_"))
# loop through each scenario and simulate data
for (i in 1:4){
  genericRecoverySim(simPar=simPar[i,], cuPar=cuPar,  srDat=srDat,
            variableCU=FALSE, ricPars=NULL, #ricPars,  #cuCustomCorrMat = cuCustomCorrMat,
            dirName="example", nTrials=2, makeSubDirs=FALSE, random=FALSE)}

In this example the outputs of the simulations are separately stored in ./outputs/simData/example, as a .csv for each scenario. So let's load one of them and since the simulator generates simulated data for 5 populations (Conservation Units) and 2 Monte Carlo trials lets just grab the output from one of them.

simDataStationary <- readRDS(here("outputs", "simData", "example",
                         "stationary_ref_CUsrDat.RData"))$srDatout

# simDataStationary <- read.csv(here("outputs", "simData", "example",
#                          "stationary_ref_CUsrDat.csv"),
#                        stringsAsFactors = F)

simDataStationaryCU1 <- simDataStationary %>% 
  filter(CU==1 & iteration ==1)

Now we can try fitting a Dynamic Linear Model to the simulated stock-recruitment data. The model in this case is a linearized Ricker stock-recruit model with log(recruits/spawner) as the dependent variable, log(recruits/spawner) at low spawner abundance as the intercept (alpha), and a density dependence term as the slope (beta which is the inverse of capacity). To fit the model use a wrapper function we wrote for the workshop fitDLM() which calls functions from the dlmpackage. We can specify whether the model estimate alpha and/or beta modeled as dynamic latent states that follow a random walk with TRUE/FALSE statements in the function call.

For now we will set the model to estimate time-varying alpha alpha_vary=TRUE but not beta beta_vary=FALSE, even thought the simulated data are based on a stationary alpha and beta.

# rename a couple of columns to play nice with fitDLM. 
colnames(simDataStationaryCU1)[c(2,4,5,8,9)] <- c("byr","spwn", "rec", "alpha_true", "beta_true")

# fit DLM
dlm_model <- fitDLM(data = simDataStationaryCU1,
              alpha_vary = TRUE,
              beta_vary = FALSE)

The function fitDLM returns a list with two elements. The first element results is a data frame containing the original simulated data plus Kalman filtered and smoothed estimates of alpha and beta over time. The second element is the AICc of the model though it should be noted that we have some unanswered questions about whether this is calculated appropriately.

And then we can plot the simulated stock-recruitment data (with points color-coded by year) and compare the true and estimated values of alpha (intercept) and beta (slope).

plotDLM(dlm_model)

No bad, in this case the model does a good job recovering the true (non-time-varying) parameter estimates.

Now let see how the model does with a population that has time varying productivity (alpha) which increases threefold over the last 30 year of the simulation.

# first read in simulate data for scenario with increasing productivity
# simDataIncreaseProd <- read.csv(here("outputs", "simData", "example",
#                          "increaseProd_ref_CUsrDat.csv"),
#                        stringsAsFactors = F)

simDataIncreaseProd <- readRDS(here("outputs", "simData", "example",
                         "increaseProd_ref_CUsrDat.RData"))$srDatout

simDataIncreaseProdyCU1 <- simDataIncreaseProd %>% 
  filter(CU==1 & iteration ==1)

colnames(simDataIncreaseProdyCU1)[c(2,4,5,8,9)] <- c("byr","spwn", "rec", "alpha_true", "beta_true")

# then fit DLM
dlm_model <- fitDLM(data = simDataIncreaseProdyCU1,
              alpha_vary = TRUE,
              beta_vary = FALSE)

# then plot
plotDLM(dlm_model)

This is pretty decent as well. How about a scenario where only capacity (1/beta) changes over time and where we fit a model that only estimates time varying beta?

# first read in simulate data for scenario with increaseing productivity
# simDataIncreaseCapacity <- read.csv(here("outputs", "simData", "example",
#                          "increaseCapacity_ref_CUsrDat.csv"),
#                        stringsAsFactors = F)

simDataIncreaseCapacity <- readRDS(here("outputs", "simData", "example",
                         "increaseCapacity_ref_CUsrDat.RData"))$srDatout

simDataIncreaseCapacityCU1 <- simDataIncreaseCapacity %>% 
  filter(CU==1 & iteration ==1)

colnames(simDataIncreaseCapacityCU1)[c(2,4,5,8,9)] <- c("byr","spwn", "rec", "alpha_true", "beta_true")

# then fit DLM
dlm_model <- fitDLM(data = simDataIncreaseCapacityCU1,
              alpha_vary = FALSE,
              beta_vary = TRUE)

# then plot
plotDLM(dlm_model)

The model does well here too.

Lastly, how does the DLM do with a population that has time varying productivity (alpha) and capacity (1/beta)?

# first read in simulate data for scenario with increasing productivity and capacity
# simDataIncreaseProdCapacity <- read.csv(here("outputs", "simData", "example",
#                          "IncreaseProd&Capacity_ref_CUsrDat.csv"),
#                        stringsAsFactors = F)

simDataIncreaseProdCapacity <- readRDS(here("outputs", "simData", "example",
                         "increaseProd&Capacity_ref_CUsrDat.RData"))$srDatout

simDataIncreaseProdCapacityCU1 <- simDataIncreaseProdCapacity %>% 
  filter(CU==1 & iteration ==1)

colnames(simDataIncreaseProdCapacityCU1)[c(2,4,5,8,9)] <- c("byr","spwn", "rec", "alpha_true", "beta_true")

# then fit DLM
dlm_model <- fitDLM(data = simDataIncreaseProdCapacityCU1,
              alpha_vary = TRUE,
              beta_vary = TRUE)

# then plot
plotDLM(dlm_model)

Looks like the model has a hard time estimating both alpha and beta, at least for this particular simulated dataset, when both change the same magnitude and direction over the same time period.

TESA-workshops/TESAsamSim documentation built on Feb. 6, 2021, 12:25 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com