In NOAA-EDAB/ms-keyrun: All information for keyrun review

knitr::opts_chunk$set(echo = TRUE, 
                      message = FALSE,
                      warning = FALSE)

#library(googledrive)
library(pacman)
p_load(repmis, RCurl, DT, magrittr, gsubfn, stringr, tidyverse)

Species lists

(We first pull species lists from the Rpath model and load the survey results.)

# Sean's file from https://github.com/NOAA-EDAB/GBRpath/blob/master/data/SOE_species_list.RData
# loads the RData as object "species"

invisible(source_data("https://github.com/NOAA-EDAB/GBRpath/blob/master/data/SOE_species_list.RData?raw=True"))

# used to get a list to paste into google form
#write.table(file = "spplist.txt", sort(unique(species$RPATH)))

rpathspp <- species %>%
  select(RPATH, COMNAME, SCINAME, SVSPP) %>%
  filter(RPATH != "") %>%
  distinct() 

aggs <- rpathspp %>%
  count(RPATH, sort = TRUE, name = "Nspp") %>%
  filter(Nspp > 1)

singlespp <- anti_join(rpathspp, aggs) %>%
  mutate(Nspp = 1)

rpathlist <- full_join(singlespp, aggs)

#survey link https://forms.gle/gfA3tKFKRBxf3Mu38
#results link https://docs.google.com/spreadsheets/d/1sckY46jdyBa1Fhydw2syBawkjGPcHX4YwQmDCDBXcGU/edit#gid=284425233

# resultfile <- drive_find(pattern = "MS-Keyrun Species", type = "spreadsheet")
# 
# responses <- drive_download(resultfile, type = "csv", overwrite = TRUE) %>%
#   {read.csv(.$local_path)} 

responses <- read.csv("supportingFiles/MS-Keyrun Species, Environment, Data Format (Responses) - Form Responses 1.csv",header=T)

names(responses)[3] <- "Species"
names(responses)[5] <- "Environment"
names(responses)[7] <- "ModelInR"
names(responses)[8] <- "ModelUseR"
names(responses)[9] <- "RData"

Multispecies models (focal species) {#focalSpecies}

Multispecies models focus on a subset of interacting species. They will estimate population parameters based on fits to biomass, catch, and (if appropriate) size or age information, establish reference points, and evaluate status for the interacting species.

Decision: Use the top ranked 10 species in the table below: Atlantic herring, Atlantic cod, goosefish, haddock, spiny dogfish, winter founder, yellowtail flounder, Atlantic mackerel, silver hake, and winter skate. Rationale: These species occupy Georges Bank while pollock is found more in the neighboring Gulf of Maine.

These species received >2 vote for inclusion in MS-Keyrun multispecies models:

# each row is a response with a comma delimited list of species
# string split each row to species
# tally by species
# histogram

msspecies <- data.frame(Species = responses$Species,
                        Model = c("MS", "MS", "NA", "MS", "Rpath", "MS")) %>%
  filter(Model != "NA") %>%
  separate_rows(Species, sep = ",") %>%
  mutate(Species = str_trim(Species, side = "both"))

focal <- msspecies %>%
  filter(Model == "MS") %>%
  count(Species) %>%
  filter(n > 2) %>%
  left_join(rpathspp, by = c("Species" = "RPATH")) %>%
  select(-COMNAME) %>%
  arrange(desc(n))

knitr::kable(focal)

Species modeled by all compared multispecies models could include all of these, or could be limited to a subset.

Food web model (focal + rest of system)

The food web model is intended to examine wider ecosystem responses to management measures applied to a subset of interacting species. It will estimate predator-prey interaction and other parameters based on fits to biomass, catch, and diet information. This model should track general trends (if not interannual variability) for focal species and produce reasonable reactions by their predators and prey.

Current species groups in the GB Rpath model (all retained for MS-Keyrun):

datatable(rpathlist, rownames = FALSE, 
          options = list(pageLength = 25, 
                         order = list(list(0, 'asc'))
          ))

Envrionmental data

Hydra, the length-based multispecies model, already incorporates a time series of Georges Bank bottom temperature as a covariate in simulation mode to force changes in growth, maturity, and recruitment. For MS-Keyrun, we can explore fitting the parameters governing the influence of this covariate.

Rpath can be directly forced by a primary production time series, and temperature-based mediation functions can be included (but it is unclear whether parameters can be fit for these functions).

Two mulispecies models are not currently set up to incorporate environmental data: MSCAA, MSSPM/Kraken

The WHAM model can potentially incorporate any environmental time series that can influence recruitment or mortality.

These environmental time series received >1 vote for inclusion in MS-Keyrun multispecies and Rpath models:

# each row is a response with a comma delimited list of environmental data
# string split each row to dataset name
# tally by dataset
# histogram

msenv <- data.frame(Environment = responses$Environment,
                        Model = c("MS", "MS", "NA", "MS", "Rpath", "MS")) %>%
  filter(Model != "NA") %>%
  separate_rows(Environment, sep = ",") %>%
  mutate(Environment = str_trim(Environment, side = "both"))

includedenv <- msenv %>%
  count(Environment) %>%
  filter(n > 1) %>%
  arrange(desc(n))

knitr::kable(includedenv)

Decision: We will include GB temperature and primary production time series as part of the MS-Keyrun dataset. Modelers may retrieve additional data from ecodata.

The ecodata R package has up to date information at the Georges Bank EPU scale as listed here: https://noaa-edab.github.io/ecodata/landing_page.

Current time series in ecodata:

#remotes::install_github("noaa-edab/ecodata",build_vignettes=TRUE)
library(ecodata)

ecodsets <- data(package = "ecodata")$result[, c("Item", "Title")]

datatable(ecodsets, rownames = FALSE, 
          options = list(pageLength = 25)
)

Programming frameworks and data formats

Most models are either implemented in R or can work directly with R datasets:

# these are multiple choice yes/no so just plot

useR <- data.frame(ModelInR = responses$ModelInR,
                   ModelUseR = responses$ModelUseR,
                   RData = responses$RData) %>%
  gather(question, answer, ModelInR:RData) %>%
  mutate(shortans = ifelse(is.na(word(answer, 1, 2)), 
                           answer, 
                           word(answer, 1, 2)))
yesR <- useR %>%
  group_by(question) %>%
  count(answer=="Yes") %>%
  mutate(tot = sum(n)) %>%
  mutate(percent = n/tot*100) 

knitr::kable(yesR)

Decision: All surveyed agreed that we should maintain datasets within an R data package for the MS-Keyrun project.

See discussion of pros and cons of using R data package for standardizing input datasets and making them available.

Full survey results

Survey conducted July 13-29 2020

This is the survey form.

We had r nrow(responses) responses.

Species

Responses:

ggplot(msspecies, aes(Species)) + 
  geom_bar() +
  facet_wrap(~Model, nrow = 2, scales="free") +
  theme(axis.text.x = element_text(angle = 90))

knitr::kable(responses$Rationale.for.species.included, col.names = "Species rationale", booktabs=TRUE)

Environment

Responses:

ggplot(msenv, aes(Environment)) + 
  geom_bar() +
  facet_wrap(~Model, nrow = 2, scales="free") +
  theme(axis.text.x = element_text(angle = 90))

knitr::kable(responses$Rationale.for.environmental.time.series.included, col.names = "Environment rationale", booktabs=TRUE)

Model programming frameworks

Many but not all models interface with R and can use R datasets. Even for those that do not, all agreed on using an R data package for standardizing input datasets and making them available.

Responses:

ggplot(useR, aes(question)) +
  geom_bar(aes(fill=shortans)) +
  labs(x = "Question", 
       fill = "Short Answer") +
  scale_x_discrete(labels = function(x) str_wrap(x, width = 35))