In NOAA-EDAB/ms-keyrun: All information for keyrun review

knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE)
library(here)
library(tidyverse)  
library(atlantisom)
library(ggthemes)
library(FSA)

Simulating input data from an ecosystem model

Here we use existing Atlantis ecosystem model output to generate input datasets for a variety of simpler population models, so that the performance of these models can be evaluated against known (simulated) ecosystem dynamics. Atlantis models simulate a wide range of physical and ecological processes, include a full food web, and can be run using different climate forcing, fishing, and other scenarios.

We extract simulated data using the R package atlantisom. The purpose of atlantisom is to use existing Atlantis model output to generate input datasets for a variety of models, so that the performance of these models can be evaluated against known (simulated) ecosystem dynamics. Atlantis models can be run using different climate forcing, fishing, and other scenarios. Users of atlantisom will be able to specify fishery independent and fishery dependent sampling in space and time, as well as species-specific catchability, selectivty, and other observation processes for any Atlantis scenario. Internally consistent multispecies and ecosystem datasets with known observation error characteristics will be the atlantisom outputs, for use in individual model performance testing, comparing performance of alternative models, and performance testing of model ensembles against "true" Atlantis outputs.

Species in the ms-keyrun dataset

Our initial species selection includes 11 single species groups from the Atlantis NOBA model.

#fgs <- load_fgs(here("simulated-data/atlantisoutput/NOBA_March_2020"), "nordic_groups_v04.csv")

lname <- data.frame(Name= c("Long_rough_dab",
                                 "Green_halibut",
                                 "Mackerel",
                                 "Haddock",
                                 "Saithe",
                                 "Redfish",
                                 "Blue_whiting",
                                 "Norwegian_ssh",
                                 "North_atl_cod",
                                 "Polar_cod",
                                 "Capelin"),
                    Long.Name = c("Long rogh dab",
                                 "Greenland halibut",
                                 "Mackerel",
                                 "Haddock",
                                 "Saithe",
                                 "Redfish",
                                 "Blue whiting",
                                 "Norwegian spring spawning herring",
                                 "Northeast Atlantic Cod",
                                 "Polar cod",
                                 "Capelin"),
                    Latin = c("*Hippoglossoides platessoides*",
                              "*Reinhardtius hippoglossoides*",
                              "*Scomber scombrus*",
                              "*Melongrammus aeglefinus*",
                              "*Pollachius virens*",
                              "*Sebastes mentella*",
                              "*Micromesistius poutassou*",
                              "*Clupea harengus*",
                              "*Gadus morhua*",
                              "*Boreogadus saida*",
                              "*Mallotus villosus*"),
                    Code = c("LRD", "GRH", "MAC", "HAD", "SAI", "RED", 
                             "BWH", "SSH", "NCO", "PCO", "CAP")
)

# sppsubset <- merge(fgs, lname, all.y = TRUE)
# spptable <- sppsubset %>% 
#   arrange(Index) %>%
#   select(Name, Long.Name, Latin)

spptable <- lname %>%
   select(Name, Long.Name, Latin)

knitr::kable(spptable, col.names = c("Model name", "Full name", "Latin name"))

Below we will double check survey census outputs against the true values, and then test some survey specifications that may be reasonable for multispecies model testing. Finally we will apply survey functions to the detailed diet data.

Generating a census dataset (February 2021)

Configuration files specified once

NOBA2config.R specifies location and names of files needed for atlantisom to initialize:

omdimensions.R standardizes timesteps, etc. (this is part of atlantisom and should not need to be changed by the user):

Change these survey and fishery config files

mssurvey_spring.R and mssurvey_fall.R configure the fishery independent surveys (in this census test, surveys sample all model polygons in all years and have efficiency of 1 for all species, with no size selectivity):

msfishery.R configures the fishery dependent data:

Run atlantisom and save outputs

True datasets are generated as follows, using atlantisom wrapper functions om_init to assemble initial true atlantis data, om_species to subset true data for desired species, om_index to generate survey biomass and total catch biomass indices, om_comps to generate age and length compositions and average weight at age from surveys and fisheries, and om_diet to generate diet from surveys. Outputs are saved to the atlantisoutput folder (not kept on github due to size):

NOBAom <- om_init(here("simulated-data/config/NOBA2Config.R"))

NOBAom_ms <- om_species(sppsubset$Name, NOBAom)

#need to change internal call to source in atlantisom om_index om_comps and om_diet functions
#expecting a config folder in same directory as rmd
#this is a workaround

dir.create(file.path(here("docs/config")))

file.copy(here("simulated-data/config/omdimensions.R"), here("docs/config/omdimensions.R"))

NOBAom_ms_ind <- om_index(usersurvey = c(here("simulated-data/config/mssurvey_spring.R"), 
                                         here("simulated-data/config/mssurvey_fall.R")), 
                           userfishery = here("simulated-data/config/msfishery.R"),
                           omlist_ss = NOBAom_ms, 
                           n_reps = 1, 
                           save = TRUE)

NOBAom_ms_comp <- om_comps(usersurvey = c(here("simulated-data/config/mssurvey_spring.R"), 
                                         here("simulated-data/config/mssurvey_fall.R")), 
                           userfishery = here("simulated-data/config/msfishery.R"),
                           omlist_ss = NOBAom_ms, 
                           n_reps = 1, 
                           save = TRUE)

NOBAom_ms_diet <- om_diet(config = here("simulated-data/config", "NOBA2config.R"),
                          dietfile = "NOBADetDiet.gz",
                          usersurvey = c(here("simulated-data/config/mssurvey_spring.R"), 
                                         here("simulated-data/config/mssurvey_fall.R")), 
                          omlist_ss = NOBAom_ms, 
                          n_reps = 1, 
                          save = TRUE)

unlink(here("docs/config"), recursive = TRUE)

Write to model input files

Our goal was to have a reproducible process for all aspects of data generation through model inputs. This section gives examples of how input datasets were built. Now that the simulated data is included in the mskeyrun data package, data inputs are derived from those sources.

Multispecies surplus production model (MSSPM)

Example input files are in the model-input-files directory. First we write the fairly simple biomass and catch inputs for the multispecies surplus production model MSSPM from our census survey outputs.

o.name <- here("simulated-data/msinput/msspm/")

stepperyr <- if(NOBAom_ms$runpar$outputstepunit=="days") 365/NOBAom_ms$runpar$toutinc

#read in survey biomass data
survObsBiom <- read_savedsurvs(d.name, 'survB') #reads in all surveys

#multiple surveys named in list object, use as filename
for(s in names(survObsBiom)){
  #arrange into wide format: year, Species1, Species2 ... and write csv
  svbio <- survObsBiom[[s]][[1]] %>%
    #calc_timestep2time(d.name, ., run.prm.file) %>% #this gives an arbitrary start year of 1970
    dplyr::mutate(year = ceiling(time/stepperyr)) %>%
    dplyr::select(species, year, atoutput) %>%
    tidyr::spread(species, atoutput) %>%
    write_csv(paste0(o.name,s,".csv"))
}

#read in catch data
read_savedfisheries <- function(dir, type){

  datlook <- data.frame(dattype = c('Catch', 'catchAge', 'catchLen', 'catchWtage', 'catchAnnAge', 'catchAnnWtage'),
                        pattern = c("*fishCatch.rds", "*fishObsAgeComp.rds", "*fishObsLenComp.rds", "*fishObsWtAtAge.rds",
                                    "*fishObsFullAgeComp.rds", "*fishObsFullWtAtAge.rds"))

  fisheries <- list.files(path=dir, pattern = as.character(datlook$pattern[datlook$dattype %in% type]), full.names = TRUE)

  fishery.name <-  str_match(fisheries, paste0(scenario.name,"_\\s*(.*?)\\s",datlook$pattern[datlook$dattype==type]))[,2]

  dat <- lapply(fisheries, readRDS)

  names(dat) <- fishery.name

  return(dat)
}

catchbio_ss <- read_savedfisheries(d.name, 'Catch')

for(c in names(catchbio_ss)){
  #arrange into wide format: year, Species1, Species2 ... and write csv
  catchbio <- catchbio_ss[[c]][[1]] %>%
    dplyr::mutate(year = time/365) %>%
    dplyr::select(species, year, atoutput) %>%
    tidyr::spread(species, atoutput) %>%
    write_csv(paste0(o.name,"fishery_",c,".csv"))
}

#write readme for metadata
meta <- c("Simulated data from Norweigan-Barents Atlantis Model by Hansen et al. \n\n",
          paste("Atlantis outputs stored in", d.name, '\n\n'),
          paste("Biomass output for surveys", names(survObsBiom), '\n\n'),
          paste("Fishery catch output for fisheries", names(catchbio_ss), '\n\n'),
          "Biomass and catch output units are annual tons.\n\n",
          "Survey specifications:\n",
          "----------------------\n",
          read_file(here("/simulated-data/config/mssurvey_fall.R")),
          "\n",
          "----------------------\n",
          read_file(here("/simulated-data/config/mssurvey_spring.R")),
          "\n",
          "----------------------\n",
          read_file(here("/simulated-data/config/mssurvey_fall_01.R")),
          "\n",
          "----------------------\n",
          read_file(here("/simulated-data/config/mssurvey_spring_01.R")),
          "\n\n",
          "Fishery specifications:\n",
          "----------------------\n",
          read_file(here("/simulated-data/config/msfishery.R")),
          "\n",
          "----------------------\n",
          read_file(here("/simulated-data/config/msfishery_01.R")),
          "\n"
          )

cat(as.character(meta), file=paste0(here("simulated-data/msinput/msspm/"),"README.txt"))

The MSSPM needs diet compostion information as well. Here we supply two survey time series of diet data, one census and a trial survey dataset with more typical survey coverage issues.

Diet composition is by predator age class for each time (unit=time.days) and is a proportion (sums to 1 for each time). We get species diet composition by aggregating over age classes (weight diet composition at age by true biomass at age).

source(here("simulated-data/config/NOBA2Config.R"))

all_diets <- read_savedsurvs(d.name, 'survDiet')

# divide time.days by runpar$toutinc to match time in biomass
# aggregate over ages using true biomass at age (survey diet is in proportion by age)

NOBAom_ms <- readRDS(file.path(d.name, paste0(scenario.name, "omlist_ss.rds")))

trueB_age_allbox <- NOBAom_ms$truebio_ss %>%
  #sum over polygons
  #aggregate(dat$atoutput,list(dat$species,dat$agecl,dat$time, dat$prey),sum)
  group_by(species,agecl,time) %>%
  summarise(trueBatage = sum(atoutput))

trueB <- trueB_age_allbox %>%
  group_by(species, time) %>%
  summarise(trueB = sum(trueBatage))

trueBagecl <- merge(trueB_age_allbox, trueB)

#multiple surveys named in list object, use as filename
for(s in names(all_diets)){
  #arrange into wide format: year, Species1, Species2 ... and write csv
  svdiet <- all_diets[[s]][[1]] %>%
    #calc_timestep2time(d.name, ., run.prm.file) %>% #this gives an arbitrary start year of 1970
    dplyr::mutate(time = time.days/NOBAom_ms$runpar$toutinc) %>%
    dplyr::left_join(., trueBagecl) %>%
    dplyr::mutate(dietBwt = dietSamp*trueBatage) %>%
    dplyr::group_by(species,time,prey,trueB) %>%
    dplyr::summarise(totdietBwt = sum(dietBwt)) %>%
    dplyr::mutate(totdietComp = totdietBwt/trueB) %>%
    dplyr::mutate(year = ceiling(time/stepperyr)) %>%
    dplyr::ungroup() %>%
    dplyr::select(year, species, prey, totdietComp) %>%
    write_csv(paste0(o.name,"diet_",s,".csv"))
}

Length-structured multispecies model (Hydra)

Note that this is the old method preserved here for history, and now Hydra inputs are coming directly from mskeyrun. Much of this code formed the basis for functions that build mskeyrun simulated data objects, now in mskeyrun\data-raw\R

For the length-based model hydra, we can write .csv files to be used as input in the hydradata package, which writes .dat and .pin files.

Hydra needs survey biomass time series, catch time series, survey length composition, fishery length composition, and diet composition data.

Observations for biomass and catch are very similar to those created for MMSPM. Hydra inputs will need to be in the same species order for each dataset, so first we make a species file and some of the other input data that can come straight from Atlantis.

# foodweb and species file identify predators and prey; take from true atlantis diet comp
diettrue <- load_diet_comp(here("simulated-data/atlantisoutput/NOBA_march_2020"), 
                           "nordic_runresults_01DietCheck.txt", fgs = fgs)
ms_diet <- diettrue %>%
  filter(species %in% sppsubset$Name) %>%
  filter(atoutput>0) %>%
  mutate(species = as.character(species),
         prey = as.character(prey))

predPrey <- ms_diet %>%
  filter(prey %in% unique(species)) %>%
  select(species, agecl, time.days, prey, atoutput) %>%
  group_by(species, prey) %>%
  summarise(avgpreyprop = mean(atoutput))

foodweb <- left_join(sppsubset, predPrey, by=c("Name" = "species")) %>%
  select(species, prey, avgpreyprop) %>%
  mutate(avgpreyprop = ceiling(avgpreyprop)) %>%
  spread(species, avgpreyprop) %>%
  filter(!is.na(prey)) %>%
  replace(is.na(.), 0)

predOrPrey <- ifelse(colSums(foodweb[-1])>0, 1, 0)

specieslist <- data.frame(species = unique(unique(survObsBiom[[2]][[1]]$species)),
                          guild = rep("na"),
                          guildmember = c(1:11),
                          predOrPrey = predOrPrey
)

# take lenwt parameters from true atlantis pars
lenwtpar <-  left_join(sppsubset, NOBAom_ms$biol$wl, by=c("Code" = "group")) %>%
  select(Name, a, b) %>%
  arrange(Name)

# fixed parameters:
# intake same for all teleosts
intakespp  <- rbind(species = sort.default(sppsubset$Name),
                    alpha = rep(0.004),
                    beta = rep(0.11))
colnames(intakespp) <- intakespp[1,]
intakespp <- as.data.frame(intakespp[-1,])

# predator size preference ratio same for all species
sizepref <- rbind(species = sort.default(sppsubset$Name),
                  mu = rep(0.5),
                  sigma = rep(2))
colnames(sizepref) <- sizepref[1,]
sizepref <- as.data.frame(sizepref[-1,])

# dimensions for singletons file
singletons <- data.frame(
  debug=3,
  Nyrs=NOBAom_ms$runpar$nyears, #may want to take from survey time series instead
  Nspecies=length(sppsubset$Name),
  Nsizebins=5,
  Nareas=1,
  Nfleets=length(unique(NOBAom_ms$truecatchage_ss$fleet)),
  Nsurveys=2,
  wtconv=1,
  yr1Nphase=0,
  recphase=0,
  avg_rec_phase=0,
  avg_F_phase=0,
  dev_rec_phase=0,
  dev_F_phase=0,
  fqphase=0,
  sqphase=0,
  ssig_phase=0,
  csig_phase=0,
  phimax=1,
  scaleInitialN=1,
  otherFood=0000,
  LFI_size=60,
  bandwidth_metric=5,
  assessmentPeriod=3,
  flagMSE=0,
  flagLinearRamp=1,
  baseline_threshold=0.2
)

# M1 same for all species, depends on n length bins
# to get annual M1 of 0.1, this matrix needs to be M1 per timestep
# right now input works for 5 timesteps per year, but should be 
# done in the code with input annual M1 in case timestep changes
M1mat <- matrix(0.02, singletons$Nspecies, singletons$Nsizebins, 
                dimnames = list(c(sort.default(sppsubset$Name)),  c(paste0("sizeclass",seq(1:singletons$Nsizebins))))) 
M1 <- as.data.frame(M1mat)

# sexratio fixed 50:50 for all species
sexratio <- matrix(0.5, 1, singletons$Nspecies,
                   dimnames = list("ratio", c(sort.default(sppsubset$Name))))


# placeholders for unused covariates dimenstioned to the number of simulated years
dummycovar <- data.frame(dummy1 = rep(1, NOBAom_ms$runpar$nyears))

# placeholder for effort data dimensioned by year and nfleets
allfleets <- atlantisom::load_fisheries(d.name, fisheries.file)
fleets <- unique(NOBAom_ms$truecatchage_ss$fleet)

dummyeffort <- data.frame(1:singletons$Nyrs,
                          matrix(1, singletons$Nyrs, singletons$Nfleets))
names(dummyeffort) <- c("year", fleets)

# will need to fit von B and maturity curve functions to back out hydra input pars
# maturity from maturity ogive input in biol file
# growth functions from length at age in model...

# write the csvs
o.name <- here("simulated-data/msinput/hydra/")
write_csv(foodweb, paste0(o.name,"foodweb_NOBA.csv"))
write_csv(specieslist, paste0(o.name,"speciesList_NOBA.csv"))
write_csv(lenwtpar, paste0(o.name,"lengthweight_species_NOBA.csv"))
write.csv(intakespp, paste0(o.name,"intake_species_NOBA.csv"))
write.csv(sizepref, paste0(o.name,"mortality_M2sizePreference_NOBA.csv"))
write.table(t(singletons), paste0(o.name,"singletons_NOBA.csv"), sep=",",  col.names=FALSE)
write.csv(M1, paste0(o.name,"mortality_M1_NOBA.csv"))
write_csv(dummycovar, paste0(o.name,"recruitment_covariates_NOBA.csv"))
write_csv(dummycovar, paste0(o.name,"maturity_covariates_NOBA.csv"))
write_csv(dummycovar, paste0(o.name,"growth_covariates_NOBA.csv"))
write_csv(dummyeffort, paste0(o.name,"observation_effort_NOBA.csv"))
write.csv(sexratio, paste0(o.name,"sexratio_NOBA.csv"))

Decisions on size bins for hydra: keep 5 bins per species for now, build in flexibility to allocate size data to any number of bins with any definitions.

Species specific bin width definitions are an input into the model, with more refined (smaller) bin sizes during rapid growth phases.

Initially, we'll directly transfer bin widths for GB and NOBA common species:

Aherring = Norwegian_ssh
Acod = North_atl_cod (but max size could be bigger in simulated data, may need to increase last bin)
haddock = Haddock (but max size could be bigger in simulated data, may need to increase last bin)
Amackerel = Mackerel. (note, should mackerel have smaller bin widths for fast early growth?)

We will substitute bin widths from these similar taxa/size GB species for NOBA:

silverhake ~ Blue_whiting
yellowtailfl ~ Long_rough_dab a.k.a. American plaice

We will modify or subsitute bin widths for the rest of the NOBA species based on size:

NOBA Redfish have a maximum size of 47 cm, use 5 x 10 cm bins
NOBA Saithe have a maximum size of 130 cm; use goosefish bins
NOBA Polar_cod have a maximum size of 25 cm; use 5 x 5 cm bins
NOBA Capelin are smaller than any GB modeled species, max size 21 cm; use 5 x 5 cm bins
NOBA Green_halibut are large, max size 120 cm; use Acod bins

Code to make sizebins csv for input; hardcoded based on these decisions.

# for reference, GB hydra_sim size bins
# sizeclass1,sizeclass2,sizeclass3,sizeclass4,sizeclass5,commentField-notused
# 20,20,20,20,30,spinydog
# 20,20,20,20,40,winterskate
# 5,5,5,5,20,Aherring
# 20,20,20,40,50,Acod
# 10,10,20,20,20,haddock
# 10,10,10,10,20,yellowtailfl
# 10,10,10,10,10,winterfl
# 10,10,10,10,10,Amackerel
# 10,10,10,10,30,silverhake
# 20,20,20,30,40,goosefish

sizebinmat <- matrix(0, singletons$Nspecies, singletons$Nsizebins, 
                dimnames = list(c(sort.default(sppsubset$Name)),  c(paste0("sizeclass",seq(1:singletons$Nsizebins)))))

sizebinmat["Blue_whiting",] <- c(10,10,10,10,30)
sizebinmat["Capelin",] <- c(5,5,5,5,5)
sizebinmat["Green_halibut",] <- c(20,20,20,40,50)
sizebinmat["Haddock",] <- c(10,10,20,20,20)
sizebinmat["Long_rough_dab",] <- c(10,10,10,10,20)
sizebinmat["Mackerel",] <- c(10,10,10,10,10)
sizebinmat["North_atl_cod",] <- c(20,20,20,40,50)
sizebinmat["Norwegian_ssh",] <- c(5,5,5,5,20)
sizebinmat["Polar_cod",] <- c(5,5,5,5,5)
sizebinmat["Redfish",] <- c(10,10,10,10,10)
sizebinmat["Saithe",] <- c(20,20,20,30,40)

sizebins <- as.data.frame(sizebinmat) %>%
  mutate('commentField-notused' = rownames(sizebinmat))

write_csv(sizebins, paste0(o.name,"length_sizebins_NOBA.csv"))

These decisions on matching species can carry through to other life history parameters, at least initially.

Need to fit growth and maturity parameters from Atlantis output for input into hydra. Fit growth using code below, based on the package FSA [@ogle-fsa-2021]. The package provides multiple ways to get starting values for the R nls function for fitting nonlinear models. It was necessary to try several different methods for starting values to get at least one von Bertalannfy growth parameter set for each species. A scratch code block shows some of the experiments to get the fitting working that resulted in the function to fit for all surveys.

Note that spring surveys don't get mackerel at all (not in the model domain then). A full set of von B parameters could only be estimated for the first era of both versions of the fall survey tested here.

sp_age <- NOBAom_ms$funct.group_ss[, c("Name", "NumCohorts", "NumAgeClassSize")]
sp_age <- sp_age %>%
  mutate(n_annages = NumCohorts * NumAgeClassSize) 

stepperyr <- if(NOBAom_ms$runpar$outputstepunit=="days") 365/NOBAom_ms$runpar$toutinc


# need growth, vonB t0, K, and Linf and simple growth function
# would you really base this on 140 years of data? I used all survey years 

len_comp_data <- read_savedsurvs(d.name, 'survLen')

# need to shift agecl to represent midpoint of actual age groups
# using part of the wtage interpolation function calc_avgwtstage2age()
annage_comp_data <- read_savedsurvs(d.name, 'survAnnAge')



vb1 <- vbFuns()
# VBGF packages all take age and length pairs, reformat from length comp at agecl
# break into 3 chunks of time, "era" 48 years each more like our survey
# subsample after reformatting with uncount because we wouldn't have this many ages

#multiple surveys named in list object, use as filename
for(s in names(len_comp_data)){

  annages <- annage_comp_data[[s]][[1]] %>%
    group_by(species, time, agecl) %>%
    rename(trueage = agecl) %>%
    summarise(truenatage = sum(atoutput))

  # Figure out the groups that have multiple ages classes in each stage (or
  # cohort)
  multiple_ages <- sp_age[sp_age$NumAgeClassSize>1,]

  # finds the average age of each agecl each timestep based on truenatage
  svlen_avgage <- annages %>%
    dplyr::semi_join(len_comp_data[[s]][[1]], by = c("species", "time")) %>%
    dplyr::inner_join(multiple_ages, by = c("species" = "Name")) %>%
    dplyr::mutate(agecl = as.integer(ceiling(trueage/NumAgeClassSize))) %>%
    dplyr::group_by(species, time, agecl) %>%
    dplyr::mutate(avgage = weighted.mean(trueage, truenatage))


  #arrange into long format: year, Species, agecl, length 
  svlenage <- len_comp_data[[s]][[1]] %>%
    dplyr::left_join(svlen_avgage) %>%
    dplyr::mutate(year = ceiling(time/stepperyr)) %>%
    dplyr::select(species, agecl, avgage, year, upper.bins, atoutput) %>%
    dplyr::mutate(avgage = coalesce(avgage,agecl)) %>%
    dplyr::rename(length = upper.bins) %>%
    dplyr::mutate(era = as.numeric(cut_number(year,3))) %>%
    tidyr::uncount(atoutput) %>%
    dplyr::group_by(species, avgage, year) %>%
    dplyr::sample_frac(0.01) 

  saveRDS(svlenage, file.path(d.name, paste0(scenario.name, "_",
                                                       s, "agelensamp.rds")))
# }
# 
#   # read saved sample from above
#   survs <- list.files(path=d.name, pattern = "agelensamp.rds", full.names = TRUE)
# 
#   survname <-  str_match(survs, paste0(scenario.name,"_\\s*(.*?)\\s","*agelensamp.rds"))[,2]
# 
#   agelensamp_data <- lapply(survs, readRDS)
# 
#   names(agelensamp_data) <- survname
# 
# 
# for(s in names(agelensamp_data)){

  # try different start points to see if we can get more species fit
  # this one goes to 11!
  # need because group_map doesn't keep group names
  gk <- svlenage %>%
    group_by(species, era) %>%
    group_keys()

  svvbgf3.2 <- svlenage %>%   
    dplyr::group_by(species, era) %>%
    group_map(~try(nls(length~vb1(avgage, Linf, K, t0), data = .,
                       start = vbStarts(length~avgage,data = .,methLinf="longFish"))))

  #this gets exponential growth pars, probably won't use this growth mod though
  svexp3 <- svlenage %>%
    dplyr::group_by(species, era) %>%
    group_modify(~ broom::tidy(lm(log(length) ~ log(avgage), data = .x))) %>%
    select(species, era, term, estimate) %>%
    pivot_wider(names_from = "term", values_from = "estimate") %>%
    mutate(psi = exp(`(Intercept)`)) %>%
    rename(kappa = `log(avgage)`) %>%
    select(species, era, kappa, psi)

  vbout.2 <- data.frame() 
  for(i in 1:length(svvbgf3.2)) {
    if(is.list(svvbgf3.2[[i]])) {
      spe <- cbind(as.data.frame(gk)[i,], as.data.frame(t(coef(svvbgf3.2[[i]]))))
      vbout.2 <- rbind(vbout.2, spe)
    }
  }

  growthpars <- left_join(svexp3, vbout.2) %>%
    mutate(growthType = case_when(
      is.na(Linf) ~ 2,
      TRUE ~ 4)) # vonB default, exponential if no vonB

  # sadly doesn't work so just make a csv for each era and use the one with most spp
  # maxera <- growthpars %>% 
  #   filter(!is.na(Linf)) %>% 
  #   group_by(era) %>% 
  #   tally() %>%
  #   filter(n==1) %>%
  #   select(era)

  growthspp <- growthpars %>%
    select(species, psi, kappa, Linf, K, growthType) %>%
    pivot_longer(psi:growthType, names_to = "par") %>%
    pivot_wider(names_from = species, 
                values_from = value) 

  growthspp.era <- split(growthspp, list(growthspp$era)) 

  for(era in names(growthspp.era)) {
    gpar <- growthspp.era[[era]] %>%
      ungroup()%>%
      select(-era) %>%
      remove_rownames() %>% 
      column_to_rownames(var="par") %>%
      write.csv(paste0(o.name,"growth_species_NOBA_era",era,"_",s,".csv"))

  }

}

This code block has experiments with different fits and different ways to splice the length at age data.

  capelinvbgf <- svlenage %>%
    filter(species == "Capelin") %>%
    group_by(era) %>%
    group_map(try(~nls(length~vb1(avgage, Linf, K, t0), data = .,
                  start = vbStarts(length~avgage,data = .))))

  capelinall <- svlenage %>% filter(species == "Capelin")
  plot(length~avgage,data=capelinall,pch=19,col=rgb(0,0,0,1/5))
  curve(vb1(x,Linf=coef(capelinvbgf[[1]])),from=0,to=5,col="red",lwd=2,add=TRUE)
  curve(vb1(x,Linf=coef(capelinvbgf[[2]])),from=0,to=5,col="blue",lwd=2,add=TRUE)
  curve(vb1(x,Linf=coef(capelinvbgf[[3]])),from=0,to=5,col="green",lwd=2,add=TRUE)

  # herringvbgf <- svlenage %>%
  #   filter(species == "Norweigan_ssh", era==1) %>%
  #   group_map(try(~nls(length~vb1(avgage, Linf, K, t0), data = .,
  #                 start = vbStarts(length~avgage,data = .))))

#Error in nls(length ~ vb1(avgage, Linf, K, t0), data = ., start = vbStarts(length ~  : 
  #singular gradient with group_map, era 2 and 3
  codvbgf <- svlenage %>%
    filter(species == "North_atl_cod") 
    #group_by(era) %>%
    codfit <- nls(length~vb1(avgage, Linf, K, t0), data = codvbgf,
                  start = vbStarts(length~avgage,data = codvbgf))

    plot(length~avgage,data=codvbgf,pch=19,col=rgb(0,0,0,1/5))
    curve(vb1(x,Linf=coef(codfit)),from=0,to=20,col="red",lwd=2,add=TRUE)
#  era==1   Nonlinear regression model
#   model: length ~ vb1(avgage, Linf, K, t0)
#    data: codvbgf
#     Linf        K       t0 
# 104.7120   0.4260   0.2176 
#  residual sum-of-squares: 4109723
# 
# Number of iterations to convergence: 7 
# Achieved convergence tolerance: 2.169e-06

#  all years Nonlinear regression model
#   model: length ~ vb1(avgage, Linf, K, t0)
#    data: codvbgf
#     Linf        K       t0 
# 112.4694   0.3912   0.2097 
#  residual sum-of-squares: 7354906
# 
# Number of iterations to convergence: 11 
# Achieved convergence tolerance: 9.964e-06


  hadvbgf <- svlenage %>%
    filter(species == "Haddock") 
    #group_by(era) %>%
  hadfit <- nls(length~vb1(avgage, Linf, K, t0), data = hadvbgf,
                  start = vbStarts(length~avgage,data = hadvbgf))

  plot(length~avgage,data=hadvbgf,pch=19,col=rgb(0,0,0,1/5))
  curve(vb1(x,Linf=coef(hadfit)),from=0,to=20,col="red",lwd=2,add=TRUE)


# all years
#     Nonlinear regression model
#   model: length ~ vb1(avgage, Linf, K, t0)
#    data: hadvbgf
#    Linf       K      t0 
# 77.0277  0.3601 -0.3546 
#  residual sum-of-squares: 15637500
# 
# Number of iterations to convergence: 11 
# Achieved convergence tolerance: 4.187e-06

  redvbgf <- svlenage %>%
    filter(species == "Redfish") 
    #group_by(era) %>%

  redfit <- nls(length~vb1(avgage, Linf, K, t0), data = redvbgf,
                  start = vbStarts(length~avgage,data = redvbgf))

  plot(length~avgage,data=redvbgf,pch=19,col=rgb(0,0,0,1/5))
  curve(vb1(x,Linf=coef(redfit)),from=0,to=40,col="red",lwd=2,add=TRUE)

#all years
# Nonlinear regression model
#   model: length ~ vb1(avgage, Linf, K, t0)
#    data: redvbgf
#    Linf       K      t0 
# 40.8016  0.1165 -1.6688 
#  residual sum-of-squares: 3197990
# 
# Number of iterations to convergence: 13 
# Achieved convergence tolerance: 8.316e-06 

    saithe1 <- svlenage %>%
      filter(species=="Saithe",
             era==1)

    saithe1fit <- nls(length~vb1(avgage, Linf, K, t0), data = saithe1,
                  start = vbStarts(length~avgage,data = saithe1))

    plot(length~avgage,data=saithe1,pch=19,col=rgb(0,0,0,1/5))
    curve(vb1(x,Linf=coef(saithe1fit)),from=0,to=20,col="red",lwd=2,add=TRUE)
#Nonlinear regression model
#   model: length ~ vb1(avgage, Linf, K, t0)
#    data: saithe1
#      Linf         K        t0 
# 388.18797   0.01975  -3.94798 
#  residual sum-of-squares: 5691213
# 
# Number of iterations to convergence: 5 
# Achieved convergence tolerance: 5.511e-06

#plot len-age data and start values, some are bad
  svlenage_pl <- svlenage %>%
    group_by(species, era) %>%
    group_map(~vbStarts(length~avgage,data=cur_group(),plot=TRUE))


### full time series fits with different start points  

 svvbgf <- svlenage %>%   
    dplyr::group_by(species) %>%
    group_map(~try(nls(length~vb1(avgage, Linf, K, t0), data = .,
                  start = vbStarts(length~avgage,data = .))))

 spk <- svlenage %>%
   group_by(species) %>%
   group_keys()

vballout <- data.frame() 
for(i in 1:length(svvbgf)) {
  if(is.list(svvbgf[[i]])) {
    sp <- cbind(as.data.frame(spk)[i,], t(coef(svvbgf[[i]])))
    vballout <- rbind(vballout, sp)
  }
}

svvbgf.1 <- svlenage %>%   
    dplyr::group_by(species) %>%
    group_map(~try(nls(length~vb1(avgage, Linf, K, t0), data = .,
                  start = vbStarts(length~avgage,data = .,methLinf="oldAge"))))

vballout.1 <- data.frame() 
for(i in 1:length(svvbgf.1)) {
  if(is.list(svvbgf.1[[i]])) {
    sp <- cbind(as.data.frame(spk)[i,], t(coef(svvbgf.1[[i]])))
    vballout.1 <- rbind(vballout.1, sp)
  }
}

svvbgf.2 <- svlenage %>%   
    dplyr::group_by(species) %>%
    group_map(~try(nls(length~vb1(avgage, Linf, K, t0), data = .,
                  start = vbStarts(length~avgage,data = .,methLinf="longFish"))))

vballout.2 <- data.frame() 
for(i in 1:length(svvbgf.2)) {
  if(is.list(svvbgf.2[[i]])) {
    sp <- cbind(as.data.frame(spk)[i,], t(coef(svvbgf.2[[i]])))
    vballout.2 <- rbind(vballout.2, sp)
  }
}

################## by era

 gk <- svlenage %>%
   group_by(species, era) %>%
   group_keys()

 svvbgf3 <- svlenage %>%   
    dplyr::group_by(species, era) %>%
    group_map(~try(nls(length~vb1(avgage, Linf, K, t0), data = .,
                  start = vbStarts(length~avgage,data = .))))

vbout <- data.frame() 
for(i in 1:length(svvbgf3)) {
  if(is.list(svvbgf3[[i]])) {
    spe <- cbind(as.data.frame(gk)[i,], t(coef(svvbgf3[[i]])))
    vbout <- rbind(vbout, spe)
  }
}

# try different start points to see if we can get more species fit
 svvbgf3.1 <- svlenage %>%   
    dplyr::group_by(species, era) %>%
    group_map(~try(nls(length~vb1(avgage, Linf, K, t0), data = .,
                  start = vbStarts(length~avgage,data = .,methLinf="oldAge"))))


vbout.1 <- data.frame() 
for(i in 1:length(svvbgf3.1)) {
  if(is.list(svvbgf3.1[[i]])) {
    spe <- cbind(as.data.frame(gk)[i,], t(coef(svvbgf3.1[[i]])))
    vbout.1 <- rbind(vbout.1, spe)
  }
}

# try different start points to see if we can get more species fit
# this one goes to 11!
svvbgf3.2 <- svlenage %>%   
    dplyr::group_by(species, era) %>%
    group_map(~try(nls(length~vb1(avgage, Linf, K, t0), data = .,
                  start = vbStarts(length~avgage,data = .,methLinf="longFish"))))


vbout.2 <- data.frame() 
for(i in 1:length(svvbgf3.2)) {
  if(is.list(svvbgf3.2[[i]])) {
    spe <- cbind(as.data.frame(gk)[i,], t(coef(svvbgf3.2[[i]])))
    vbout.2 <- rbind(vbout.2, spe)
  }
}

Generating length composition input data requires mapping numbers at length in the atlantisom 1 cm bin length output to our selected length bins, then summing the numbers at length in those bins for all species. The input required is a 4d array, indexed by survey, species, year, and sizebin. So each survey can be a separate csv with #species at the top of year rows and sizebin columns with n at length the values, scrolling down for each species.

Arrange lengths into appropriate bins:

# rearrange into length bins and write length comp data csvs
# modbins from sizebins above, differ by species so need modbin.min and modbin.max
# pcumsum is an FSA function

modbins <- sizebins %>%
  rename(species = `commentField-notused`) %>%
  pivot_longer(!species, names_to = "sizebin", values_to = "binwidth") %>%
  group_by(species) %>%
  arrange(sizebin) %>%
  mutate(modbin.min = pcumsum(binwidth),
         modbin.max = cumsum(binwidth)) %>%
  arrange(species, sizebin)


#multiple surveys named in list object, use as filename
for(s in names(len_comp_data)){
  #arrange into long format: year, Species, agecl, length 
  svlenbin <- len_comp_data[[s]][[1]] %>%
    dplyr::mutate(year = ceiling(time/stepperyr)) %>%
    dplyr::select(species, agecl, year, upper.bins, atoutput) %>%
    dplyr::left_join(modbins) %>%
    dplyr::filter(modbin.min < upper.bins & upper.bins <= modbin.max) %>%
    dplyr::group_by(species, year, sizebin) %>%
    dplyr::summarise(sumlen = sum(atoutput)) %>%
    tidyr::spread(sizebin, sumlen) %>%
    write_csv(paste0(o.name,"observation_lengths_NOBA_",s,".csv"))
}

The first pass through with the current bins results in very few lengths in the smallest bin for most species. This suggests we need to change the binwidths and possibly the number of bins to optimize length fitting. Perhaps a larger first bin or don't start it at 0?

Consider a function to do this using species max size, the vonB k and the desired number of bins for more consistency and to easily change Nsizebins.

This function is below, but with only 5 bins I don't like how it works, allocating more smaller bins to the smallest sizes that we don't observe and aggregating many lengths into one large bin. Therefore the simplest approach is to have equal sized bins (max observed length/desired number of bins). We will try this for the length comps.

# see e.g., https://www.sciencedirect.com/science/article/abs/pii/S0165783619303297
# estimated optimal bin size for VBGF estimation is 0.23 x Lmax ^0.6

# see Ogle citing Schnute and Fornier on K as a growth increment
# https://derekogle.com/NCNRS349/modules/Growth/BKG
# K exponential rate of approach to asymptotic size
# exp(-K) fraction by which growth increment is multiplied each year
# at what length is increment*exp(-K) small? group those lengths into bigger bins

# pull observed min and max size from data to optimize start and end bins
minmaxlen <- list()
# read in saved vonB pars--see if seasons are different and will need a replacement if K wasn't estimated
#era <- 1
vonBK <- list()
minmaxvonBK <- list()
for(s in names(len_comp_data)){
  minmaxlendf <- len_comp_data[[s]][[1]] %>%
    dplyr::group_by(species) %>%
    dplyr::summarise(minlen = min(lower.bins, na.rm = TRUE),
                     maxlen = max(upper.bins, na.rm = TRUE),
                     range = maxlen-minlen) 

  for(era in 1:3){
  vonBKer <- read.csv(paste0(o.name,"growth_species_NOBA_era",era,"_",s,".csv")) %>%
    rename(par = X) %>%
    pivot_longer(!par, names_to = "species") %>%
    pivot_wider(names_from = par,
                names_prefix = era,
                values_from = value) %>%
    select(species, paste0(era,"K"))

  ifelse(era==1,
         vonBKdf <- vonBKer,
         vonBKdf <- left_join(vonBKdf, vonBKer))
  }

  minmaxvonBKdf <- left_join(minmaxlendf,vonBKdf)

  minmaxlen[[s]] <- minmaxlendf
  vonBK[[s]] <- vonBKdf
  minmaxvonBK[[s]] <- minmaxvonBKdf

}

# note!
# Green_halibut and Saithe have maxlen at 200 in at least one survey, 
# may need higher maxlength when running atlantisom comps wrapper, fix survey config files


estbinwidths <- function(Lmax, vonBk, Nsizebins, thresh = 0.05){
  #optimal bin size is lower end
  binopt <- 0.23*Lmax^0.6

  #range gives initial default binwidth, all equal (range/Nbins)
  #or should it be max/nbins?
  #can use if no K estimated
  bindef <- max(1,ceiling(Lmax/Nsizebins))

  #minmaxvonBK$BTS_fall_allbox_effic1 %>% group_by(species) %>% mutate(bindef = ceiling(maxlen/5))

  #vonBk tells you how many fine bins for fast growth
  #low vonBK is doesn't asympote, divide length range equally into nbins?
  #high vonBK asympotes quickly, 
  # threshold cumprod of 0.05 or 0.1 to define last bin? tiny growth increment
  # use binopt up to this threshold? 
  # plot(cumprod(rep(1, 160)*exp(-0.0386)), ylim=c(0,1), xlim=c(0,180)) #saithe
  # lines(cumprod(rep(1, 20)*exp(-1.4)), ylim=c(0,1), xlim=c(0,180)) #capelin
  # lines(cumprod(rep(1, 40)*exp(-0.163)), ylim=c(0,1), xlim=c(0,180)) #redfish
  # lines(cumprod(rep(1, 130)*exp(-0.344)), ylim=c(0,1), xlim=c(0,180)) #cod

  lenthresh <- max(which(cumprod(rep(1, Lmax)*exp(-vonBk)) > thresh)) #gives 1 cm increments of fast growth

  #put all lengths above this into one big bin, 
  binwidth.max <- Lmax-lenthresh

  #divide lengths below this by thresholds? for min 5 bins, use .5, .3, .2, .1, .05?
  # or divide equally into nbins-1 bins? equal may be best
  # hightest growth bin will be very small and we won't have fish in it
  # dont go smaller than binopt
  binwidth.gro <- max(ceiling(binopt), ceiling(lenthresh/(Nsizebins-1)))

  #aggregate bins in multiples of optimal for bigger fish/slower growth to get Nsizebins
  ifelse(binwidth.max > bindef & binwidth.max < 0.5*Lmax,
         binwidths <- c(rep(binwidth.gro, Nsizebins-1), binwidth.max),
         binwidths <- rep(bindef, Nsizebins))

  return(binwidths)

}

equalbinwidths <- function(Lmax, Nsizebins){
  bindef <- max(1,ceiling(Lmax/Nsizebins))
  binwidths <- rep(bindef, Nsizebins)
  return(binwidths)
}

# for fixed bins across surveys find min and max observed lengths 

maxLrange <- bind_rows(minmaxvonBK) %>%
  group_by(species) %>%
  summarise(minminL = min(minlen),
            maxmaxL = max(maxlen))

Nsizebins  <-  5

newbins <- maxLrange %>%
  group_by(species) %>%
  group_modify(~broom::tidy(equalbinwidths(.$maxmaxL, Nsizebins))) %>%
  mutate(lb = paste0("sizebin",seq(1:Nsizebins))) %>%
  pivot_wider(names_from = lb, values_from = x)

sizebins <- as.data.frame(newbins) %>%
  rename('commentField-notused' = species) %>%
  relocate('commentField-notused', .after = last_col())

write_csv(sizebins, paste0(o.name,"length_sizebins_NOBA.csv"))

Reran the lendat code block above to make the survey-specific length input files with the new equal sized bins.

Now write input files in Gavin's format to have surveys together in one file. Note that this is just a test. The folder has two different survey and two different fishery configurations; they are not independent. Also, fishery catch is not currently separated into fleets. We wouldn't use all of these in an input file. Also add fishery length comps:

modbins <- sizebins %>%
  rename(species = `commentField-notused`) %>%
  pivot_longer(!species, names_to = "sizebin", values_to = "binwidth") %>%
  group_by(species) %>%
  arrange(sizebin) %>%
  mutate(modbin.min = pcumsum(binwidth),
         modbin.max = cumsum(binwidth)) %>%
  arrange(species, sizebin)

allsvlenbin <- tibble()

#multiple surveys named in list object, add to file
for(s in names(len_comp_data)){
  #arrange into long format: year, Species, agecl, length 
  svlenbin <- len_comp_data[[s]][[1]] %>%
    dplyr::mutate(year = ceiling(time/stepperyr)) %>%
    dplyr::select(species, agecl, year, upper.bins, atoutput) %>%
    dplyr::left_join(modbins) %>%
    dplyr::filter(modbin.min < upper.bins & upper.bins <= modbin.max) %>%
    dplyr::group_by(species, year, sizebin) %>%
    dplyr::summarise(sumlen = sum(atoutput)) %>%
    tidyr::spread(sizebin, sumlen) %>%
    ungroup() %>%
    dplyr::mutate(inpN = rowSums(.[,-(1:2)], na.rm = TRUE)) %>%
    dplyr::mutate(type = 0) %>%
    dplyr::mutate(survey = s) %>%
    dplyr::mutate_at(vars(contains("sizebin")), ~./inpN) %>%
    dplyr::select(survey, year, species, type, inpN, everything())

  allsvlenbin <- bind_rows(allsvlenbin, svlenbin)
}
write_csv(allsvlenbin, paste0(o.name,"observation_lengths_NOBA_allsurvs.csv"))

#requires read_savedfisheries function below, add to atlantisom
catchlen_ss <- read_savedfisheries(d.name, "catchLen")

allfishlenbin <- tibble()
for(f in names(catchlen_ss)){
  #arrange into long format: year, Species, agecl, length 
  fishlenbin <- catchlen_ss[[f]][[1]] %>%
    dplyr::mutate(year = ceiling(time/stepperyr)) %>%
    dplyr::select(species, agecl, year, upper.bins, atoutput) %>%
    dplyr::left_join(modbins) %>%
    dplyr::filter(modbin.min < upper.bins & upper.bins <= modbin.max) %>%
    dplyr::group_by(species, year, sizebin) %>%
    dplyr::summarise(sumlen = sum(atoutput)) %>%
    tidyr::spread(sizebin, sumlen) %>%
    ungroup() %>%
    dplyr::mutate(inpN = rowSums(.[,-(1:2)], na.rm = TRUE)) %>%
    dplyr::mutate(type = 0) %>%
    dplyr::mutate(fishery = f,
                  area = 1) %>%
    dplyr::mutate_at(vars(contains("sizebin")), ~./inpN) %>%
    dplyr::select(fishery, area, year, species, type, inpN, everything()) %>%
    dplyr::arrange(fishery, area, species, year)

  allfishlenbin <- bind_rows(allfishlenbin, fishlenbin)
}
write_csv(allfishlenbin, paste0(o.name,"observation_lengths_NOBA_allfisheries.csv"))

Stomach weights are needed as input to hydra. At present each species has a stomach weight for each size bin that is repeated for each year. We have total consumed weight for each predator age class at each timestep in the detailed diet atlantis output, so would need to map age classes to length bins, sum consumed weight by length bin and divide by total numbers in that length bin to get stomach weight.

What we have saved so far is a diet comp in proportion that has had bias and observation error added to proportions. However, we need the stomach weight portion of the equation, which has the survey timing/area/efficiency/selectivity applied to true consumed weight. The hydra input is mean daily stomach weight in grams. From the Atlantis manual, "DetailedDietCheck.txt returns the total consumed biomass since the last output given for each cell (box and layer) of each age group of each functional group." So daily per capita consumed biomass is this output, summed over prey for total consumption, divided by the numbers from the same survey design, divided by the output step omlist_ss$runpar$outputstep.

# apply surveys to true total consumption (snipped from atlantosom om_diet.R and om_comps.R)
detaileddiet <- readRDS(file.path(d.name, paste0(scenario.name, "detaileddiet.rds")))

usersurvey <- c(here("simulated-data/config/mssurvey_spring.R"),
                here("simulated-data/config/mssurvey_fall.R"),
                here("simulated-data/config/mssurvey_spring_01.R"),
                here("simulated-data/config/mssurvey_fall_01.R"))

omlist_ss <- readRDS(file.path(d.name, paste0(scenario.name, "omlist_ss.rds")))

source(here("simulated-data/config/omdimensions.R"))

survObsStomWt <- list()

for (s in usersurvey)
{
  source(s, local = TRUE)

  # survtime doesn't match units of time.days in detaileddiet
  survtime <- survey_sample_full*omlist_ss$runpar$outputstep

  # apply survey design to detailed diet
  survey_cons <- create_survey_diet(dat = detaileddiet,
                                    time = survtime,
                                    species = survspp,
                                    boxes = survboxes,
                                    effic = surveffic,
                                    selex = survselex)

  # get numbers at ageclass for same survey design
  # note different time units!
  survey_N <- atlantisom::create_survey(dat = omlist_ss$truenums_ss,
                                        time = survey_sample_full,
                                        species = survspp,
                                        boxes = survboxes,
                                        effic = surveffic,
                                        selex = survselex.agecl)

  # convert survey N times to cons times in days
  survey_N$time <- survey_N$time*omlist_ss$runpar$outputstep

  # get rid of polygon
  survey_totN <- survey_N %>%
    group_by(species, agecl, time) %>%
    summarise(totN = sum(atoutput)) %>%
    ungroup()

  # sum over prey to get total consumption in t, divide by N and timestep, convert to g
  survey_totcons <- survey_cons %>%
    group_by(species, agecl, time) %>%
    summarise(totcons = sum(atoutput)) %>%
    ungroup() %>%
    left_join(survey_totN) %>%
    mutate(percap_cons = totcons/totN) %>%
    #mutate(daily_percap_g = percap_cons/omlist_ss$runpar$outputstep*1000000)
    mutate(daily_percap_g = percap_cons*1000000) #try assuming cons is snapshot not cumulative since last timestep

  #save survey consumption, takes a long time to generate with lots of reps/species
  #if(save){
  saveRDS(survey_cons, file.path(d.name, paste0(scenario.name, "_",
                                                survey.name, "surveycons.rds")))
  #}

  survObsStomWt[[survey.name]] <- survey_cons

}

# map to length bins using surey numbers at agecl in length bins

# #multiple surveys named in list object, use as filename
# for(s in names(all_diets)){
#   #arrange into wide format: year, Species1, Species2 ... and write csv
#   svdiet <- all_diets[[s]][[1]] %>%
#     #calc_timestep2time(d.name, ., run.prm.file) %>% #this gives an arbitrary start year of 1970
#     dplyr::mutate(time = time.days/NOBAom_ms$runpar$toutinc) %>%
#     dplyr::left_join(., trueBagecl) %>%
#     dplyr::mutate(dietBwt = dietSamp*trueBatage) %>%
#     dplyr::group_by(species,time,prey,trueB) %>%
#     dplyr::summarise(totdietBwt = sum(dietBwt)) %>%
#     dplyr::mutate(totdietComp = totdietBwt/trueB) %>%
#     dplyr::mutate(year = ceiling(time/stepperyr)) %>%
#     dplyr::ungroup() %>%
#     dplyr::select(year, species, prey, totdietComp) %>%
#     write_csv(paste0(o.name,"diet_",s,".csv"))
# }

We also plan to fit to time series of prey proportions for each predator size class. Gavin suggests an input format similar to those for survey lengths. Diet output from surveys is by predator agecl

# same file as for MSSPM, diet proportion by agecl
all_diets <- read_savedsurvs(d.name, 'survDiet')

# get size bins for predators
sizebins <- read_csv(paste0(o.name,"length_sizebins_NOBA.csv"))

modbins <- sizebins %>%
  rename(species = `commentField-notused`) %>%
  pivot_longer(!species, names_to = "sizebin", values_to = "binwidth") %>%
  group_by(species) %>%
  arrange(sizebin) %>%
  mutate(modbin.min = pcumsum(binwidth),
         modbin.max = cumsum(binwidth)) %>%
  arrange(species, sizebin)

# get Dirichlet alphamults from config files
svcon <- list.files(path=here("/simulated-data/config/"), pattern = "*survey*", full.names = TRUE)

svalphamultlook <- tibble()
for(c in 1:length(svcon)){
  source(svcon[c])
  svalp <- tibble(
    surv_alphamult_n = alphamult, 
    survey = survey.name
  )
  svalphamultlook <- bind_rows(svalphamultlook, svalp)
}

# diet is by predator agecl, need diet by predator sizebin
# map predator agecl to lengthbin using length_comp_data from same survey

allsvdietprop <- tibble()

#multiple surveys named in list object
for(s in names(all_diets)){
  #arrange into lookup for agecl to lengthbin in numbers of fish 
  svagelenbin <- len_comp_data[[s]][[1]] %>%
    dplyr::mutate(year = ceiling(time/stepperyr)) %>%
    dplyr::select(species, agecl, year, upper.bins, atoutput) %>%
    dplyr::left_join(modbins) %>%
    dplyr::filter(modbin.min < upper.bins & upper.bins <= modbin.max) %>%
    dplyr::group_by(species, year, agecl, sizebin) %>%
    dplyr::summarise(sumlen = sum(atoutput)) %>%
    dplyr::group_by(species, year, sizebin) %>%
    dplyr::mutate(propage = sumlen/sum(sumlen)) #proportion of each agecl contributing to lengthbin

  svdietprop <- all_diets[[s]][[1]] %>%
    dplyr::mutate(year = ceiling(time.days/365)) %>%
    # join with agecl to length lookup
    dplyr::select(species, agecl, year, prey, dietSamp) %>%
    dplyr::left_join(svagelenbin) %>%
    dplyr::mutate(dietpropage = dietSamp*propage) %>% #reweight diets for lengthbins
    dplyr::group_by(species, year, sizebin, prey) %>%
    dplyr::summarise(dietsize = sum(dietpropage)) %>%
    dplyr::filter(prey %in% unique(modbins$species)) %>% #drops predators that don't eat our modeled species
    tidyr::spread(prey, dietsize) %>%
    ungroup() %>%
    dplyr::filter(!is.na(sizebin)) %>% 
    dplyr::mutate(allotherprey = 1-rowSums(.[,-(1:3)], na.rm = TRUE)) %>%
    dplyr::mutate(survey = s) %>%
    dplyr::left_join(svalphamultlook) %>% #what is effective sample size for Dirichlet?
    dplyr::rename(inpN = surv_alphamult_n) %>% #use this input for now
    dplyr::select(survey, year, species, sizebin, inpN, everything())

  allsvdietprop <- bind_rows(allsvdietprop, svdietprop)
}
write_csv(allsvdietprop, paste0(o.name,"observation_diets_NOBA_allsurvs.csv"))


#

Do we need to fit recruitment? May just need average recriutment starting values, which I can get from atlantis. Maturity parameters are needed for SSB calculation and SSB-based recruitment. If we fit only to average recruitment with annual deviations, and don't need SSB, maturity and recruitment inputs could be dummy parameters for now.

# need 2-par maturity curve based on length, atlantis maturity ogive based on age group
# or just translate age at 50% maturity to length and call it good?

Survey and catch datasets for hydra from NOBA (same methods as MSSPM). Make this a function that takes output directory name and output names as arguments.

o.name <- here("simulated-data/msinput/hydra/")

stepperyr <- if(NOBAom_ms$runpar$outputstepunit=="days") 365/NOBAom_ms$runpar$toutinc

#read in survey biomass data
survObsBiom <- read_savedsurvs(d.name, 'survB') #reads in all surveys

#multiple surveys named in list object, use as filename
for(s in names(survObsBiom)){
  #arrange into wide format: year, Species1, Species2 ... and write csv
  svbio <- survObsBiom[[s]][[1]] %>%
    #calc_timestep2time(d.name, ., run.prm.file) %>% #this gives an arbitrary start year of 1970
    dplyr::mutate(year = ceiling(time/stepperyr)) %>%
    dplyr::select(species, year, atoutput) %>%
    tidyr::spread(species, atoutput) %>%
    write_csv(paste0(o.name,"observation_biomass_NOBA_",s,".csv"))
}

#read in catch data
read_savedfisheries <- function(dir, type){

  datlook <- data.frame(dattype = c('Catch', 'catchAge', 'catchLen', 'catchWtage', 'catchAnnAge', 'catchAnnWtage'),
                        pattern = c("*fishCatch.rds", "*fishObsAgeComp.rds", "*fishObsLenComp.rds", "*fishObsWtAtAge.rds",
                                    "*fishObsFullAgeComp.rds", "*fishObsFullWtAtAge.rds"))

  fisheries <- list.files(path=dir, pattern = as.character(datlook$pattern[datlook$dattype %in% type]), full.names = TRUE)

  fishery.name <-  str_match(fisheries, paste0(scenario.name,"_\\s*(.*?)\\s",datlook$pattern[datlook$dattype==type]))[,2]

  dat <- lapply(fisheries, readRDS)

  names(dat) <- fishery.name

  return(dat)
}

catchbio_ss <- read_savedfisheries(d.name, 'Catch')

for(c in names(catchbio_ss)){
  #arrange into wide format: year, Species1, Species2 ... and write csv
  catchbio <- catchbio_ss[[c]][[1]] %>%
    dplyr::mutate(year = time/365) %>%
    dplyr::select(species, year, atoutput) %>%
    tidyr::spread(species, atoutput) %>%
    write_csv(paste0(o.name,"observation_catch_NOBA_",c,".csv"))
}

#write readme for metadata
meta <- c("Simulated data from Norweigan-Barents Atlantis Model by Hansen et al. \n\n",
          paste("Atlantis outputs stored in", d.name, '\n\n'),
          paste("Biomass output for surveys", names(survObsBiom), '\n\n'),
          paste("Fishery catch output for fisheries", names(catchbio_ss), '\n\n'),
          "Biomass and catch output units are annual tons.\n\n",
          "Survey specifications:\n",
          "----------------------\n",
          read_file(here("/simulated-data/config/mssurvey_fall.R")),
          "\n",
          "----------------------\n",
          read_file(here("/simulated-data/config/mssurvey_spring.R")),
          "\n",
          "----------------------\n",
          read_file(here("/simulated-data/config/mssurvey_fall_01.R")),
          "\n",
          "----------------------\n",
          read_file(here("/simulated-data/config/mssurvey_spring_01.R")),
          "\n\n",
          "Fishery specifications:\n",
          "----------------------\n",
          read_file(here("/simulated-data/config/msfishery.R")),
          "\n",
          "----------------------\n",
          read_file(here("/simulated-data/config/msfishery_01.R")),
          "\n"
          )

cat(as.character(meta), file=paste0(here("simulated-data/msinput/hydra/"),"README.txt"))

Survey and catch datasets in Gavin's suggested format (all surveys together, all fleets together). As above, note that this is just a test. The folder has two different survey and two different fishery configurations; they are not independent. Also, fishery catch is not currently separated into fleets. We wouldn't use all of these in an input file.

#read in survey biomass data
survObsBiom <- read_savedsurvs(d.name, 'survB') #reads in all surveys

# get cvs from config files
svcon <- list.files(path=here("/simulated-data/config/"), pattern = "*survey*", full.names = TRUE)
fishcon <- list.files(path=here("/simulated-data/config/"), pattern = "*fishery*", full.names = TRUE)

omlist_ss <- NOBAom_ms
source(here("/simulated-data/config/omdimensions.R"))

svcvlook <- tibble()
for(c in 1:length(svcon)){
  source(svcon[c])
  surv_cv_n <- surv_cv %>% 
    mutate(survey=survey.name)
  svcvlook <- bind_rows(svcvlook, surv_cv_n)
}

fcvlook <- tibble()
for(c in 1:length(fishcon)){
  source(fishcon[c])
  fish_cv_n <- fish_cv %>%
    mutate(fishery=fishery.name)
  fcvlook <- bind_rows(fcvlook, fish_cv_n)
}

allsvbio <- tibble()

#multiple surveys named in list object, use as filename
for(s in names(survObsBiom)){
  #arrange into wide format: year, Species1, Species2 ... and write csv
  svbio <- survObsBiom[[s]][[1]] %>%
    dplyr::mutate(year = ceiling(time/stepperyr)) %>%
    dplyr::select(species, year, atoutput) %>%
    dplyr::rename(biomass = atoutput) %>%
    dplyr::mutate(survey = s) %>%
    dplyr::left_join(svcvlook) %>%
    dplyr::select(survey, year, species, everything()) %>%
    dplyr::arrange(survey, species, year)

  allsvbio <- bind_rows(allsvbio, svbio)
}
write_csv(allsvbio, paste0(o.name,"observation_biomass_NOBA_allsurvs.csv"))

allcatch <- tibble()

for(f in names(catchbio_ss)){
  catchbio <- catchbio_ss[[f]][[1]] %>%
    dplyr::mutate(year = time/365) %>%
    dplyr::select(species, year, atoutput) %>%
    dplyr::rename(catch = atoutput) %>%
    dplyr::mutate(fishery = f) %>%
    dplyr::mutate(area = 1) %>%
    dplyr::left_join(fcvlook) %>%
    dplyr::select(fishery, area, year, species, everything()) %>%
    dplyr::arrange(fishery, species, year)

  allcatch <- bind_rows(allcatch, catchbio)  
}

write_csv(allcatch, paste0(o.name,"observation_catch_NOBA_allfisheries.csv"))

Output temperature data from surveys using "Temp" variable in .nc file and the atlantisom::load_nc_physics function. A survey (timestep, polygons, possibly observation error) can then be applied to this output.

We can get surface and bottom temp from Atlantis using the layers from each polygon, with layer 0 being the bottom and the maximum layer in the polygon being the surface (according to the Atlantis wiki/Atlantis output files/Atlantis NetCDF structure.

# taken from existing load_nc to explore
# file.nc <- file.path(d.name, "outputnordic_runresults_01.nc")
# 
#   # Load ATLANTIS output!
#   at_out <- RNetCDF::open.nc(con = file.nc)
# 
#   # Get info from netcdf file! (Filestructure and all variable names)
#   var_names_ncdf <- sapply(seq_len(RNetCDF::file.inq.nc(at_out)$nvars - 1),
#     function(x) RNetCDF::var.inq.nc(at_out, x)$name)
#   n_timesteps <- RNetCDF::dim.inq.nc(at_out, 0)$length
#   n_boxes     <- RNetCDF::dim.inq.nc(at_out, 1)$length
#   n_layers    <- RNetCDF::dim.inq.nc(at_out, 2)$length
# 
# RNetCDF::close.nc(at_out)

#'@physic_variables A character value spefifying which variables
#'   should be loaded. Only one variable can be loaded at a time.
#'   Currently, only the following variables are allowed:
#'   "salt", "NO3", "NH3", "Temp", "Oxygen", "Si", "Det_Si", "DON",
#'    "Chl_a", "Denitrifiction", "Nitrification".

truetemp <- atlantisom::load_nc_physics(dir = d.name,
                  file_nc = "outputnordic_runresults_01.nc",
                  physic_variables = "Temp",
                  aggregate_layers = FALSE,
                  bboxes = atlantisom::get_boundary(NOBAom_ms$boxpars))
  #if(verbose) message("Temperature read in.")

# this averages all depth layers, don't want this
# truetempagg <- atlantisom::load_nc_physics(dir = d.name,
#                   file_nc = "outputnordic_runresults_01.nc",
#                   physic_variables = "Temp",
#                   aggregate_layers = TRUE,
#                   bboxes = atlantisom::get_boundary(NOBAom_ms$boxpars))

truebottomtemp <- truetemp %>%
  filter(layer == 0) # according to Atlantis wiki, layer 0 is bottom

truesurfacetemp <- truetemp %>%
  filter(layer<7) %>%  #layer 7 is always identical temp to layer 0, may be the added sediment layer?
  group_by(polygon, time) %>%
  filter(layer==max(layer)) %>%
  ungroup

surveyenv <- function(trueenv, 
                      survboxes,
                      survtimes) {
  surveydat <- trueenv %>%
    dplyr::filter(polygon %in% survboxes) %>%
    dplyr::filter(time %in% survtimes)

  return(surveydat)
}

omlist_ss <- NOBAom_ms
source(here("simulated-data/config/omdimensions.R"))

#need to weight by polygon area to get mean annual temp time series for each survey
boxarea <- map_dfr(NOBAom_ms$boxpars$boxes, "area") %>%
  pivot_longer(everything(), names_to = "polygon", values_to = "area") %>%
  mutate(polygon = as.integer(polygon)) %>%
  mutate(proparea = area/sum(area))


source(here("simulated-data/config/mssurvey_fall.R"))

fallcensus_bottomtemp <- surveyenv(truebottomtemp, survboxes = survboxes, survtimes = survtime)

fallcensus_btempindex <- left_join(fallcensus_bottomtemp, boxarea) %>%
  dplyr::group_by(time) %>%
  dplyr::summarise(meantemp = weighted.mean(atoutput, proparea)) %>%
  dplyr::mutate(year = ceiling(time/stepperyr)) %>%
  dplyr::select(year, meantemp) %>%
  write_csv(paste0(o.name,"observation_temperature_NOBA_",survey.name,".csv"))

source(here("simulated-data/config/mssurvey_spring.R"))

springcensus_bottomtemp <- surveyenv(truebottomtemp, survboxes = survboxes, survtimes = survtime)

springcensus_btempindex <- left_join(springcensus_bottomtemp, boxarea) %>%
  dplyr::group_by(time) %>%
  dplyr::summarise(meantemp = weighted.mean(atoutput, proparea)) %>%
  dplyr::mutate(year = ceiling(time/stepperyr)) %>%
  dplyr::select(year, meantemp) %>%
  write_csv(paste0(o.name,"observation_temperature_NOBA_",survey.name,".csv"))

source(here("simulated-data/config/mssurvey_fall_01.R"))

fallnearbox_bottomtemp <- surveyenv(truebottomtemp, survboxes = survboxes, survtimes = survtime)

fallnearbox_btempindex <- left_join(fallnearbox_bottomtemp, boxarea) %>%
  dplyr::group_by(time) %>%
  dplyr::summarise(meantemp = weighted.mean(atoutput, proparea)) %>%
  dplyr::mutate(year = ceiling(time/stepperyr)) %>%
  dplyr::select(year, meantemp) %>%
  write_csv(paste0(o.name,"observation_temperature_NOBA_",survey.name,".csv"))

source(here("simulated-data/config/mssurvey_spring_01.R"))

springnearbox_bottomtemp <- surveyenv(truebottomtemp, survboxes = survboxes, survtimes = survtime)

springnearbox_btempindex <- left_join(springnearbox_bottomtemp, boxarea) %>%
  dplyr::group_by(time) %>%
  dplyr::summarise(meantemp = weighted.mean(atoutput, proparea)) %>%
  dplyr::mutate(year = ceiling(time/stepperyr)) %>%
  dplyr::select(year, meantemp) %>%
  write_csv(paste0(o.name,"observation_temperature_NOBA_",survey.name,".csv"))

Age structured multispecies statistical catch at age model (MSSCAA)

References

NOAA-EDAB/ms-keyrun documentation built on April 20, 2024, 10:07 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

NOAA-EDAB/ms-keyrun
All information for keyrun review

In NOAA-EDAB/ms-keyrun: All information for keyrun review

Simulating input data from an ecosystem model

Species in the ms-keyrun dataset

Generating a census dataset (February 2021)

Configuration files specified once

Change these survey and fishery config files

Run atlantisom and save outputs

Write to model input files

Multispecies surplus production model (MSSPM)

Length-structured multispecies model (Hydra)

Age structured multispecies statistical catch at age model (MSSCAA)

References

R Package Documentation

Browse R Packages

We want your feedback!

NOAA-EDAB/ms-keyrun All information for keyrun review

In NOAA-EDAB/ms-keyrun: All information for keyrun review

Simulating input data from an ecosystem model

Species in the ms-keyrun dataset

Generating a census dataset (February 2021)

Configuration files specified once

Change these survey and fishery config files

Run atlantisom and save outputs

Write to model input files

Multispecies surplus production model (MSSPM)

Length-structured multispecies model (Hydra)

Age structured multispecies statistical catch at age model (MSSCAA)

References

R Package Documentation

Browse R Packages

We want your feedback!

NOAA-EDAB/ms-keyrun
All information for keyrun review