knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE)
library(here)
library(tidyverse)  
library(atlantisom)
library(ggthemes)
library(FSA)

Simulating input data from an ecosystem model

Here we use existing Atlantis ecosystem model output to generate input datasets for a variety of simpler population models, so that the performance of these models can be evaluated against known (simulated) ecosystem dynamics. Atlantis is an end-to-end spatial ecosystem model capable of including climate effects, seasonal migration, food web, and fishery interactions [@audzijonyte_atlantis_2019].

We extract simulated data using the R package atlantisom. The purpose of atlantisom is to use existing Atlantis model output to generate input datasets for a variety of models, so that the performance of these models can be evaluated against known (simulated) ecosystem dynamics. The process is briefly described here. Atlantis models can be run using different climate forcing, fishing, and other scenarios. Users of atlantisom specify fishery independent and fishery dependent sampling in space and time, as well as species-specific catchability, selectivty, and other observation processes for any Atlantis scenario. Internally consistent multispecies and ecosystem datasets with known observation error characteristics are atlantisom outputs, for use in individual model performance testing, comparing performance of alternative models, and performance testing of model ensembles against "true" Atlantis outputs.

The simulated dataset is based on output of the Norwegian and Barents Sea (NOBA) Atlantis model [@hansen_set-up_2016; @hansen_sensitivity_2019]. The simulated dataset contains comparable survey, fishery, and composition data as the Georges Bank dataset, but the time series span 80 simulation years and include 11 species. This dataset was used for initial model development, code quality testing, and model skill assessment by the modeling teams. Details of the dataset including construction and basic attributes are below.

Species in the ms-keyrun dataset

Our initial species selection includes 11 single species groups from the Atlantis NOBA model.

Generate the species table

lname <- data.frame(Name= c("Long_rough_dab",
                                 "Green_halibut",
                                 "Mackerel",
                                 "Haddock",
                                 "Saithe",
                                 "Redfish",
                                 "Blue_whiting",
                                 "Norwegian_ssh",
                                 "North_atl_cod",
                                 "Polar_cod",
                                 "Capelin"),
                    Long.Name = c("Long rough dab",
                                 "Greenland halibut",
                                 "Mackerel",
                                 "Haddock",
                                 "Saithe",
                                 "Redfish",
                                 "Blue whiting",
                                 "Norwegian spring spawning herring",
                                 "Northeast Atlantic Cod",
                                 "Polar cod",
                                 "Capelin"),
                    Latin = c("*Hippoglossoides platessoides*",
                              "*Reinhardtius hippoglossoides*",
                              "*Scomber scombrus*",
                              "*Melongrammus aeglefinus*",
                              "*Pollachius virens*",
                              "*Sebastes mentella*",
                              "*Micromesistius poutassou*",
                              "*Clupea harengus*",
                              "*Gadus morhua*",
                              "*Boreogadus saida*",
                              "*Mallotus villosus*"),
                    Code = c("LRD", "GRH", "MAC", "HAD", "SAI", "RED", 
                             "BWH", "SSH", "NCO", "PCO", "CAP")
)

# sppsubset <- merge(fgs, lname, all.y = TRUE)
# spptable <- sppsubset %>% 
#   arrange(Index) %>%
#   select(Name, Long.Name, Latin)

spptable <- lname %>%
   select(Name, Long.Name, Latin)

knitr::kable(spptable, col.names = c("Model name", "Full name", "Latin name"))

Generating a dataset

Configuration files specified once

NOBA_sacc38Config.R specifies location and names of files needed for atlantisom to initialize:


omdimensions.R standardizes timesteps, etc. (this is part of atlantisom and should not need to be changed by the user):


Change these survey and fishery config files

mssurvey_spring.R and mssurvey_fall.R configure the fishery independent surveys (in this census test, surveys sample all model polygons in all years and have efficiency of 1 for all species, with no size selectivity):



msfishery.R configures the fishery dependent data:


Run atlantisom and save outputs

True datasets are generated as follows, using atlantisom wrapper functions om_init to assemble initial true atlantis data, om_species to subset true data for desired species, om_index to generate survey biomass and total catch biomass indices, om_comps to generate age and length compositions and average weight at age from surveys and fisheries, and om_diet to generate diet from surveys. Outputs are saved to the atlantisoutput folder (not kept on github due to size):

NOBAom <- om_init(here("data-raw/simulated-data/config/NOBA_sacc38Config.R"))

NOBAom_ms <- om_species(spptable$Name, NOBAom)

#need to change internal call to source in atlantisom om_index om_comps and om_diet functions
#expecting a config folder in same directory as rmd
#this is a workaround

dir.create(file.path(here("docs/config")))

file.copy(here("data-raw/simulated-data/config/omdimensions.R"), here("docs/config/omdimensions.R"))

NOBAom_ms_ind <- om_index(usersurvey = c(here("data-raw/simulated-data/config/mssurvey_spring.R"), 
                                         here("data-raw/simulated-data/config/mssurvey_fall.R")), 
                           userfishery = here("data-raw/simulated-data/config/msfishery.R"),
                           omlist_ss = NOBAom_ms, 
                           n_reps = 1, 
                           save = TRUE)

NOBAom_ms_comp <- om_comps(usersurvey = c(here("data-raw/simulated-data/config/mssurvey_spring.R"), 
                                         here("data-raw/simulated-data/config/mssurvey_fall.R")), 
                           userfishery = here("data-raw/simulated-data/config/msfishery.R"),
                           omlist_ss = NOBAom_ms, 
                           n_reps = 1, 
                           save = TRUE)

NOBAom_ms_diet <- om_diet(config = here("data-raw/simulated-data/config/NOBA_sacc38Config.R"),
                          dietfile = "NOBADetDiet.gz",
                          usersurvey = c(here("data-raw/simulated-data/config/mssurvey_spring.R"), 
                                         here("data-raw/simulated-data/config/mssurvey_fall.R")), 
                          omlist_ss = NOBAom_ms, 
                          n_reps = 1, 
                          save = TRUE)

unlink(here("docs/config"), recursive = TRUE)

Create mskeyrun simulated data

Scripts in ms-keyrun/data-raw show the process of making mskeyun datasets from atlantisom output generated above. Atlantis outputs and atlantisom outputs produced above are local to Sarah's computer for this code to run, as they are too large for github. However, the scripts are linked here to show the process. Overall this script creates all datasets using functions specific to each data type:

Code to build simulated datasets


For example, create_sim_survey_index() takes the saved atlantisom output plus user specifications for fit start and end years to produce the dataset mskeyrun::simSurveyIndex:


All functions are in this mskeyrun repository folder: https://github.com/NOAA-EDAB/ms-keyrun/tree/master/data-raw/R

Additional simulated data for food web models

The data generated above focuses on the 11 fully age structured stocks. Here we add information needed for food web modeling on the remaining groups in the simulated system.

Code for remaining groups

# all NOBA species not already in mskeyrun dataset
fwspp <- atlantisom::load_fgs(here("data-raw/data"), "nordic_groups_v04.csv") %>% 
  dplyr::filter(IsTurnedOn == 1) %>%
  dplyr::select(Code, Name, Long.Name, isFished, InvertType) %>%
  dplyr::filter(!Name %in% spptable$Name)

knitr::kable(fwspp)

This consists of survey biomass and diet, and fishery catch information for the non age-structured species.

To generate this, we apply selected functions from the atlantisom package with slightly modified config files to ensure the food web "fw" datasets don't overwrite the multispecies datasets:

NOBAom <- om_init(here("data-raw/simulated-data/config/NOBA_sacc38Config.R"))

NOBAom_fw <- om_species(fwspp$Name, NOBAom, save = FALSE) #save by hand, don't overwrite 11 species omlist_ss

saveRDS(NOBAom_fw, file.path(d.name, paste0(scenario.name, "omlist_fw.rds")))

dir.create(file.path(here("vignettes/config")))

file.copy(here("data-raw/simulated-data/config/omdimensions.R"), here("vignettes/config/omdimensions.R"))

NOBAom_fw_ind <- om_index(usersurvey = c(here("data-raw/simulated-data/config/mssurvey_spring_fw.R"), 
                                         here("data-raw/simulated-data/config/mssurvey_fall_fw.R")), 
                           userfishery = here("data-raw/simulated-data/config/msfishery_fw.R"),
                           omlist_ss = NOBAom_fw, 
                           n_reps = 1, 
                           save = TRUE)

NOBAom_fw_diet <- om_diet(config = here("data-raw/simulated-data/config/NOBA_sacc38Config.R"),
                          dietfile = "NOBADetDiet.gz",
                          usersurvey = c(here("data-raw/simulated-data/config/mssurvey_spring_fw.R"), 
                                         here("data-raw/simulated-data/config/mssurvey_fall_fw.R")), 
                          omlist_ss = NOBAom_fw, 
                          n_reps = 1, 
                          save = TRUE)

unlink(here("vignettes/config"), recursive = TRUE)

Visualize simulated data

Plotting functions

A collection of functions used previously that may be harvested and modified for diagnostics or visualizations in the ms-keyrun and ICES WGSAM skill assessment projects.

Code for plotting functions

# plot biomass time series facet wrapped by species
plotB <- function(dat, truedat=NULL){

    svbio <- dat %>% filter(variable=="biomass")
    svcv <- dat %>% filter(variable=="cv")

    ggplot() +
    geom_line(data=svbio, aes(x=year,y=value, color="Survey Biomass"), 
              alpha = 10/10) +
    {if(!is.null(truedat)) geom_line(data=truedat, aes(x=time/365,y=atoutput, color="True B"), alpha = 3/10)} + 
    theme_tufte() +
    theme(legend.position = "top") +
    xlab("model year") +
    ylab("tons") +
    labs(colour=dat$ModSim) +
    facet_wrap(~Name, scales="free") 

}

# make a catch series function that can be split by fleet? this doesnt
# also note different time (days) from model timestep in all other output
plotC <- function(dat, truedat=NULL){

    ctbio <- dat %>% filter(variable=="catch")
    ctcv <- dat %>% filter(variable=="cv")

    ggplot() +
    geom_line(data=ctbio, aes(x=year,y=value, color="Catch biomass"), 
              alpha = 10/10) +
    {if(!is.null(truedat)) geom_line(data=truedat, aes(x=time/365,y=atoutput, color="True Catch"), alpha = 3/10)} + 
    theme_tufte() +
    theme(legend.position = "top") +
    xlab("model year") +
    ylab("tons") +
    labs(colour=dat$ModSim) +
    facet_wrap(~Name, scales="free") 

}

# note on ggplot default colors, can get the first and second using this
# library(scales)
# show_col(hue_pal()(2))

# plot length frequencies by timestep (one species)
plotlen <- function(dat, effN=1, truedat=NULL){

  cols <- c("Census Lcomp"="#00BFC4","Sample Lcomp"="#F8766D")  
  ggplot(mapping=aes(x=lenbin)) +
    {if(is.null(truedat)) geom_bar(data=dat, aes(weight = value/effN))} +
    {if(!is.null(truedat)) geom_bar(data=dat, aes(weight = censuslen/totlen, fill="Census Lcomp"), alpha = 5/10)} +
    {if(!is.null(truedat)) geom_bar(data=dat, aes(weight = atoutput/effN, fill="Sample Lcomp"), alpha = 5/10)} +
    theme_tufte() +
    theme(legend.position = "bottom") +
    xlab("length (cm)") +
    {if(is.null(truedat)) ylab("number")} +
    {if(!is.null(truedat)) ylab("proportion")} +
    scale_colour_manual(name="", values=cols) +
    labs(subtitle = paste(dat$ModSim,
                          dat$Name)) +
    facet_wrap(~year, ncol=6, scales="free_y")

}

# plot numbers at age by timestep (one species)
Natageplot <- function(dat, effN=1, truedat=NULL){
  ggplot() +
    geom_point(data=dat, aes(x=age, y=value/effN, color="Est Comp")) +
    {if(!is.null(truedat)) geom_line(data=dat, aes(x=agecl, y=numAtAge/totN, color="True Comp"))} + 
    theme_tufte() +
    theme(legend.position = "bottom") +    
    xlab("age") +
    {if(is.null(truedat)) ylab("number")} +
    {if(!is.null(truedat)) ylab("proportion")} +
    labs(subtitle = paste(dat$ModSim,
                          dat$Name)) + 
    facet_wrap(~year, ncol=6, scales="free_y")
}

# plot weight at age time series facet wrapped by species
wageplot <- function(dat, truedat=NULL){
  ggplot(dat, aes(year, value)) +
    geom_line(aes(colour = factor(age))) +
    theme_tufte() +
    theme(legend.position = "bottom") +
    xlab("model year") +
    ylab("average individual weight (g)") +
    labs(subtitle = paste0(dat$ModSim)) +
    facet_wrap(c("Name"), scales="free_y")
}

Read in the "data" to check with plots

The mskeyrun simulated data objects are all dataframes.

survObsBiom <- mskeyrun::simSurveyIndex #atlantisom::read_savedsurvs(d.name, 'survB')
#age_comp_data <- mskeyrun::simSurveyAgeLencomp #atlantisom::read_savedsurvs(d.name, 'survAge') #not using in assessment
len_comp_data <- mskeyrun::simSurveyLencomp #atlantisom::read_savedsurvs(d.name, 'survLen')
#wtage <- atlantisom::read_savedsurvs(d.name, 'survWtage')  #not using in assessment
annage_comp_data <- mskeyrun::simSurveyAgecomp #atlantisom::read_savedsurvs(d.name, 'survAnnAge')
annage_wtage <- mskeyrun::simSurveyWtatAge #atlantisom::read_savedsurvs(d.name, 'survAnnWtage')

#all_diets <- atlantisom::read_savedsurvs(d.name, 'survDiet') #not using in assessment

catchbio_ss <- mskeyrun::simCatchIndex #atlantisom::read_savedfisheries(d.name, 'Catch')
catchlen_ss <- mskeyrun::simFisheryLencomp #atlantisom::read_savedfisheries(d.name, "catchLen")
#fish_age_comp <- #atlantisom::read_savedfisheries(d.name, "catchAge")
fish_annage_comp <- mskeyrun::simFisheryAgecomp #atlantisom::read_savedfisheries(d.name, 'catchAnnAge')
fish_annage_wtage <- mskeyrun::simFisheryWtatAge #atlantisom::read_savedfisheries(d.name, 'catchAnnWtage')

Visualize survey outputs {.tabset}

These plots represent the full mskeyrun simulated time series for the survey biomass index and weight at age, and a subset for length and age composition.

Survey biomass index

# compare with true output (all timesteps)
# for(s in names(survObsBiom)){
#   cat("  \n##### ",  s,"  \n")
#   print(plotB(survObsBiom[[s]][[1]], omlist_ss$truetotbio_ss))
#   cat("  \n")
# }

# plots survey only
 for(s in unique(survObsBiom$survey)){
   cat("  \n##### ",  s,"  \n")
   print(plotB(survObsBiom %>%
                 filter((survey %in% s))))
   cat("  \n")
 }

Survey length composition

# not the full time series, just 24 yrs after catches start in year 55
for(s in unique(len_comp_data$survey)){
  cat("  \n##### ",  s,"  \n")
  lcompsub <- len_comp_data %>% filter(survey %in% s) %>% filter(year %in% c(55:78)) %>%
    group_by(Name) %>%
    group_map(~ plotlen(.x), keep = TRUE)

  for(i in 1:length(lcompsub)) {
    print(lcompsub[[i]])
  }
  cat("  \n")
}

Survey age composition (annual ages)

for(s in unique(annage_comp_data$survey)){
  cat("  \n##### ",  s,"  \n")
  acompsub <- annage_comp_data %>% filter(survey %in% s) %>% filter(year %in% c(55:78)) %>%
    group_by(Name) %>%
    #left_join(., trueNage) %>%
    group_map(~ Natageplot(.x), keep = TRUE) # plots only sampled age comp
    #group_map(~ Natageplot(.x, effN = 100000, truedat = 1), keep = TRUE) # plots merged true age comp

  for(i in 1:length(acompsub)) {
    print(acompsub[[i]])
  }
  cat("  \n")
}

Survey iterpolated weight at age (annual ages)

for(s in unique(annage_wtage$survey)){
  cat("  \n##### ",  s,"  \n")
  print(wageplot(annage_wtage %>%
                 filter((survey %in% s))))
  cat("  \n")
}

{-}

Visualize fishery data {.tabset}

These plots represent the the full mskeyrun simulated time series for fishery catch and weight at age, and a subset for length and age composition.

Fishery catch time series

# observed catch only
plotC(catchbio_ss)

Fishery length composition

lcompsub <- catchlen_ss %>% filter(year %in% c(55:78)) %>%
  group_by(Name) %>%
  group_map(~ plotlen(.x), keep = TRUE)

for(i in 1:length(lcompsub)) {
  print(lcompsub[[i]])
}

Fishery catch at age (annual ages)

acompsub <- fish_annage_comp %>% filter(year %in% c(55:78)) %>%
  group_by(Name) %>%
  #left_join(., trueCAA) %>%
  group_map(~ Natageplot(.x), keep = TRUE) # plots only sampled age comp
  #group_map(~ Natageplot(.x, effN = 200, truedat = 1), keep = TRUE) # plots merged true age comp

for(i in 1:length(acompsub)) {
  print(acompsub[[i]])
}

Fishery interpolated weight at age (annual ages)

wageplot(fish_annage_wtage)

{-}

Write to model input files

Our goal was to have a reproducible process for all aspects of data generation through model inputs. The simulated data is included in the mskeyrun data package, data inputs are derived from those sources.

Multispecies surplus production model (MSSPM)

[To be added]

Length-structured multispecies model (Hydra)

Hydra input files were developed directly from mskeyrun datasets by modifying functions in the hydradata R package. The function create_Rdata_mskeyrun.R (code) allows the user to specify whether datasets should be constructed from Atlantis-simulated or real Georges Bank data, and the number of length bins to use for composition data, then creates an R data object. This data object is then used to create data and parameter input files using the function hydradata::create_datpin_files().

For example, this process creates the simulated dataset with 5 length bins:

The create_RData_mskeyrun.R file was sourced, and create_RData_mskeyrun("sim", nlenbin = 5) was run to make a new hydraDataList_msk.rda file in the package. Then the package is built locally, R is restarted, and the following code block is run to produce input files.

library(here)
library(hydradata)
inputs <- setup_default_inputs()
inputs$outDir <- here()

inputs$outputFilename <- "hydra_sim_NOBA_5bin_0comp" 

# tpl code removes 0 so replace in data
nbins <- hydraDataList_msk$Nsizebins
hydraDataList_msk$observedCatchSize[,7:(6+nbins)][hydraDataList_msk$observedCatchSize[,7:(6+nbins)]==0] <- 1e-4
hydraDataList_msk$observedSurvSize[,6:(5+nbins)][hydraDataList_msk$observedSurvSize[,6:(5+nbins)]==0] <- 1e-4

hydraDataList_5bin_0comp <- create_datpin_files(inputs,hydraDataList_msk)

# this saves the specific hydralist object, so we could saveRDS it to a diagnostics folder?
# advantage of rds format is we can assign it when reading it in to diagnostics scripts
saveRDS(hydraDataList_5bin_0comp, file.path(here("inputRdatalists/hydraDataList_5bin_0comp.rds")))

Run create_RData_mskeyrun("sim", nlenbin = 10) in hydradata, rebuild package, restart R, then...

library(here)
library(hydradata)
inputs <- setup_default_inputs()
inputs$outDir <- here()

inputs$outputFilename <- "hydra_sim_NOBA_10bin_0comp" 

# tpl code removes 0 so replace in data
nbins <- hydraDataList_msk$Nsizebins
hydraDataList_msk$observedCatchSize[,7:(6+nbins)][hydraDataList_msk$observedCatchSize[,7:(6+nbins)]==0] <- 1e-4
hydraDataList_msk$observedSurvSize[,6:(5+nbins)][hydraDataList_msk$observedSurvSize[,6:(5+nbins)]==0] <- 1e-4


hydraDataList_10bin_0comp <- create_datpin_files(inputs,hydraDataList_msk)

# this saves the specific hydralist object, so we could saveRDS it to a diagnostics folder?
# advantage of rds format is we can assign it when reading it in to diagnostics scripts
saveRDS(hydraDataList_10bin_0comp, file.path(here("inputRdatalists/hydraDataList_10bin_0comp.rds")))

Age structured multispecies statistical catch at age model (MSSCAA)

Work in progress here but not finished for October 2022 review.

References



NOAA-EDAB/ms-keyrun documentation built on April 20, 2024, 10:07 a.m.