In KevinSee/QRFcapacity: Carrying Capacity with QRF

library(knitr)
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
knitr::opts_chunk$set(echo=FALSE, warning=FALSE, message=FALSE)

library(pander)
# options for table formatting
panderOptions('big.mark', ',')
# panderOptions('digits', 3)
# panderOptions('round', 3)
panderOptions('keep.trailing.zeros', FALSE)
panderOptions('table.split.table', Inf)

library(captioner)
tab_nums = captioner(prefix = 'Table')
fig_nums = captioner()

# load packages for analysis
library(QRFcapacity)
library(maptools)
library(tidyverse)
library(janitor)
library(magrittr)
library(minerva)
library(sf)
library(quantregForest)
library(survey)
library(MuMIn)


# set default theme for ggplot
theme_set(theme_bw())
# setwd('reports')

# determine which set of fish/habitat data to use
# data("fh_sum_dash_2014_17")
# fish_hab = fh_sum_dash_2014_17 %>%
#   mutate_at(vars(Watershed, Year),
#             list(as.factor))

data("fh_sum_champ_2017")
fish_hab = fh_sum_champ_2017 %>%
  mutate_at(vars(Watershed, Year),
            list(as.factor))

# and the appropriate habitat dictionrary to go with it
data("hab_dict_2017")
hab_dict = hab_dict_2017

# all the related habitat data
# data('champ_dash')
# hab_data = champ_dash
# 
# data('champ_dash_avg')
# hab_avg = champ_dash_avg

data("champ_site_2011_17")
hab_data = champ_site_2011_17

data("champ_site_2011_17_avg")
hab_avg = champ_site_2011_17_avg

# add temperature metrics
data("champ_temps")
hab_avg %<>%
  left_join(champ_temps %>%
              as_tibble() %>%
              select(Site, avg_aug_temp = S2_02_11) %>%
              distinct())


hab_data %<>%
  left_join(hab_data %>%
              select(VisitID, Year = VisitYear) %>%
              distinct() %>%
              left_join(champ_temps %>%
                          as_tibble() %>%
                          select(VisitID, avg_aug_temp = S2_02_11))) %>%
  left_join(hab_data %>%
              select(VisitID, Year = VisitYear) %>%
              distinct() %>%
              left_join(champ_temps %>%
                          as_tibble() %>%
                          select(Site:VisitID, S1_93_11:S36_2015) %>%
                          gather(scenario, aug_temp, S1_93_11:S36_2015) %>%
                          mutate(Year = str_sub(scenario, -4)) %>%
                          mutate_at(vars(Year),
                                    list(as.numeric)) %>%
                          filter(!is.na(Year)) %>%
                          select(Site:VisitID, Year, aug_temp))) %>%
  select(-Year)

Introduction

We used quantile random forests (@Meinshausen2006) for this investigation because random forests allow for non-linear fish-habitat relationships, naturally incorporate any correlation or interactions between habitat covariates, and allow us to examine a predicted quantile (@Cade2003), rather than only the mean of the predicted response.

Data

The fish data was collected by various agencies during the summer low-flow season at many of the same sites surveyed using the CHaMP protocol. Survey methods included mark-recapture, three-pass removal sampling, two-pass removal sampling, and single-pass electrofishing, as well as snorkeling. We estimated juvenile fish abundance (and density) at all sites where fish survey data were available. Three-pass removal estimates used the Carle-Strub estimator [@Carle1978], following advice from @Hedger2013. Two-pass removal estimates used the estimator described by @Seber2002. Mark-recapture estimates used Chapman’s modified Lincoln-Peterson estimator [@Chapman1951] and were deemed valid if they met the criteria described in @Robson1964. These estimates were made using the removal and mrClosed functions from the FSA package [@Ogle2017] in R software [@Rsoftware2015]. Snorkel counts were transformed to abundance estimates using paired snorkel-electrofishing sites to calibrate snorkel counts.

Many sites were sampled in multiple years. To avoid any chance of pseudo-replication (since habitat data is not usually expected to vary much year-to-year), for any site with multiple surveys, we chose the survey with the highest estimated density. We used this criteria because we are interested in estimating carrying capacity, so using the highest observed density seemed appropriate.

All sites fell within the range of steelhead, but many of them were outside the range of spring/summer Chinook salmon. However, defining the range of sp/su Chinook is tricky, and it is unclear if the reported data for Chinook salmon includes only sites considered with the range of Chinook, or if we should exclude data from some sites based on our best understanding of the species' range. Currently, we are excluding sites outside the Chinook domain. We created our own list of sites in the John Day that were within a Chinook domain, based on which sites Chinook were found at some point (and downstream of those).

# library(maptools)
data("chnk_domain")

# which sites were sampled for Chinook? 
chnk_samps = fish_hab %>%
  filter(Species == 'Chinook') %>%
  select(Site:Lon, N) %>%
  distinct() %>%
  st_as_sf(coords = c('Lon', 'Lat'),
           crs = 4326) %>%
  st_transform(st_crs(chnk_domain))

# set snap distance (in meters)
st_crs(chnk_samps)
snap_dist = 1000

# which of those sites are in Chinook domain?
chnk_sites = chnk_samps %>%
  as_Spatial() %>%
  snapPointsToLines_v2(points = .,
  # maptools::snapPointsToLines(points = .,
                              chnk_domain %>%
                                mutate(id = 1:n()) %>%
                                select(id, MPG) %>%
                                as_Spatial(),
                              maxDist = snap_dist,
                              withAttrs = T,
                              idField = 'id') %>%
  as('sf') %>%
  select(-nearest_line_id, -snap_dist) %>%
  # include some sites in the John Day where Chinook were found (or seemed to be close to sites where Chinook were found)
  rbind(st_read('../data/raw/domain/Chnk_JohnDay_TrueObs.shp',
                quiet = T) %>%
          st_transform(st_crs(chnk_domain)) %>%
          select(-in_range))

# ggplot() +
#   geom_sf(data = chnk_samps,
#           aes(color = 'Sampled')) +
#   geom_sf(data = chnk_sites,
#           aes(color = 'In Chnk Range')) +
#   theme(axis.text = element_blank())

fish_hab %<>%
  filter(Species == 'Steelhead' |
           (Species == 'Chinook' & Site %in% chnk_sites$Site))

After matching up the fish abundances with sites where we have habitat data, we are left with the following number of sites.

fish_hab %>%
  xtabs(~ Watershed + Species + Year, .) %>%
  addmargins(margin = c(1, 3)) %>%
  pander()

fish_hab %>%
  tabyl(Watershed, Year, Species) %>%
  adorn_totals(c('row', 'col')) %>%
  adorn_percentages('col') %>%
  adorn_pct_formatting(digits = 0,
                       rounding = 'half up') %>%
  adorn_ns %>%
  adorn_title %>%
  knitr::kable()

Selecting Covariates

A key step in developing a QRF model to predict fish capacities is selecting the habitat covariates to include in the model. Random forest models naturally incorporate interactions between correlated covariates, which is essential since nearly all habitat variables are considered correlated to one degree or another. However, we aimed to avoid overly redundant variables (i.e., variables that measure similar aspects of the habitat). Further, including too many covariates can result in overfitting of the model (e.g., including as many covariates as data points).

Methods

We used the Maximal Information-Based Nonparametric Exploration (MINE) class of statistics (@Reshef2011) to determine those habitat characteristics (covariates) most highly associated with observed parr densities. We calculated the maximal information coefficient (MIC), using the R package minerva (@Albanese2013), to measure the strength of the linear or non-linear association between two variables (@Reshef2011). The MIC value between each of the measured habitat characteristics and the response variable, juvenile fish density (fish/m), was used to inform decisions on which habitat covariates to include in the QRF parr capacity model.

Habitat metrics were first grouped into broad categories that included channel unit, complexity, cover, disturbance, riparian, size, substrate, temperature, water quality, and woody debris. Within each category, metrics were ranked according to their MIC value (Figure 1). Our strategy was to select one or two variables with the highest MIC score within each category so that covariates describe different aspects of rearing habitat (e.g., substrate, temperature, etc.).

# # what are some possible habitat covariates?
poss_hab_mets = hab_dict %>%
  filter(MetricCategory != 'Categorical') %>%
  filter(ShortName %in% names(fish_hab)) %>%
  pull(ShortName) %>%
  unique()

# pull from which CHaMP metrics could be generated from DASH
# poss_hab_mets = read_csv('../data/prepped/hab_dict_champ_dash.csv') %>%
#   select(-starts_with('X')) %>%
#   filter(DASH_gen) %>%
#   filter(MetricCategory != 'Categorical') %>%
#   pull(ShortName) %>%
#   unique()
poss_hab_mets = c(poss_hab_mets, 'aug_temp', 'avg_aug_temp') %>%
  unique()

mine_res = fish_hab %>%
  split(list(.$Species)) %>%
  map_df(.id = 'Species',
         .f = function(x) {
           x %>%
             mutate(fish_dens = log(fish_dens + 0.005)) %>%
             estimate_MIC(covars = poss_hab_mets,
                          response = 'fish_dens')
         }) %>%
  left_join(hab_dict %>%
              filter(MetricGroupName == 'Visit Metric') %>%
              select(Metric = ShortName,
                     MetricCategory,
                     Name),
            by = 'Metric') %>%
  mutate_at(vars(MetricCategory),
            list(fct_explicit_na),
            na_level = 'Other') %>%
  mutate_at(vars(Name),
            list(as.character)) %>%
  mutate(Name = if_else(is.na(Name),
                        as.character(Metric),
                        Name)) %>%
  # put the metric names in descending order by MIC
  mutate_at(vars(Metric, Name),
            list(~ fct_reorder(., .x = MIC))) %>%
  select(Species, MetricCategory, Metric, everything()) %>%
  arrange(Species, MetricCategory, desc(MIC))

mine_plot_df = mine_res #%>%
  # filter out some metrics with too many NAs or 0s
  # filter((perc_NA < 0.2 & non_0 > 100) | MetricCategory == 'Temperature') #%>%
  # # filter out metrics with very low variance
  # filter(var < 0.1) %>% select(1:11)
  # # filter out area and volume metrics
  # filter(!grepl('Area', Metric),
  #        !grepl('Vol', Metric),
  #        Metric != 'Lgth_Wet')

library(corrr)
#----------------------------------------------
# Look at correlations between habitat metrics
#----------------------------------------------
# top metrics
sel_mets = poss_hab_mets
sel_mets = mine_plot_df %>%
  group_by(MetricCategory) %>%
  slice(1:5) %>%
  ungroup() %>%
  pull(Metric) %>%
  unique() %>%
  as.character()


corr_mat = hab_avg %>%
  select(one_of(sel_mets)) %>%
  corrr::correlate()

corr_mat %>%
  rearrange(absolute = F) %>%
  shave(upper = T) %>% 
  stretch() %>%
  filter(!is.na(r)) %>%
  filter(abs(r) > 0.5) %>%
  kable()

corr_p1 = corr_mat %>%
  # rearrange(absolute = F) %>%
  shave(upper = T) %>% 
  rplot(legend = T,
        print_cor = T)

corr_p2 = network_plot(corr_mat)

Results

The MINE results are shown in the plots below.

# make a plot of MIC values for all species
mine_p = mine_plot_df %>%
  ggplot(aes(x = Name,
             y = MIC,
             fill = Species)) +
  geom_col(position = position_dodge(1)) +
  coord_flip() +
  facet_wrap(~ MetricCategory,
             scales = 'free_y',
             ncol = 3) +
  scale_fill_brewer(palette = 'Set1',
                    guide = guide_legend(nrow = 1)) +
  theme(legend.position = 'bottom',
        axis.text = element_text(size = 5))

mine_p

mine_p2 = mine_plot_df %>%
  ggplot(aes(x = Name,
             y = MIC,
             fill = Species)) +
  geom_col(position = position_dodge(1)) +
  coord_flip() +
  scale_fill_brewer(palette = 'Set1',
                    guide = guide_legend(nrow = 1)) +
  theme(legend.position = 'bottom',
        axis.text = element_text(size = 5))

mine_p2

In the end, we decided to use the same metrics for both species, and chose the following metrics:

# sel_hab_mets = crossing(Species = c('Chinook', 
#                                     'Steelhead'),
#                         Metric = c('CU_Freq',
#                                    'DpthThlwg_UF_CV',
#                                    'WetWDRat_Avg',
#                                    'FishCovTotal',
#                                    'DistPrin1',
#                                    'RipCovGrnd',
#                                    'DpthThlwg_Avg',
#                                    'SubEstGrvl',
#                                    'SubD50',
#                                    'SubLT6',
#                                    'AvgHourly',
#                                    'Max7dAM',
#                                    'Cond',
#                                    'LWFreq_Wet'))
# 
# 
# sel_hab_mets = crossing(Species = c('Chinook', 
#                                     'Steelhead'),
#                         Metric = poss_hab_mets)

# based on conversation with Mike and Richie
sel_hab_mets = crossing(Species = c('Chinook', 
                                    'Steelhead'),
                        Metric = c('UcutArea_Pct',
                                   'FishCovNone',
                                   'SubEstGrvl',
                                   'FstTurb_Freq',
                                   'FstNT_Freq',
                                   'CU_Freq',
                                   'SlowWater_Pct',
                                   'NatPrin1',
                                   'DistPrin1',
                                   'avg_aug_temp',
                                   # 'Sin_CL',
                                   'Sin',
                                   'WetWdth_CV',
                                   'WetBraid',
                                   'WetSC_Pct',
                                   'Q',
                                   'WetWdth_Int',
                                   'LWFreq_Wet',
                                   'LWVol_WetFstTurb'))

sel_hab_mets %>%
  select(ShortName = Metric) %>%
  distinct() %>%
  left_join(hab_dict) %>%
  select(Catg = MetricCategory, 
         Metric = Name, 
         Description = DescriptiveText) %>%
  pander()

Fit QRF Models

Methods

Random forest models cannot handle missing data by default, so the first step is to impute any missing data. In order to limit the amount of imputation necessary, we first delete any observation with too many missing values, and then impute the rest.

qrf_mod_df = fish_hab %>%
  split(list(.$Species)) %>%
  map_df(.id = 'Species',
         .f = function(x) {
    spp = unique(x$Species)

    covars = sel_hab_mets %>%
      filter(Species == spp) %>%
      pull(Metric)

    data = impute_missing_data(data = x %>%
                                 mutate_at(vars(Watershed, Year),
                                           list(fct_drop)),
                               covars = covars,
                               impute_vars = c('Watershed', 'Elev_M', 'Sin', 'Year', 'CUMDRAINAG'),
                               method = 'missForest') %>%
      select(Site, Watershed, Year, LON_DD, LAT_DD, fish_dens, VisitID, one_of(covars))

    return(data)

  })

When fitting the QRF model, we would like to accomodate actual 0's so we offset the density by a small positive value. We then log-transformed the resulting fish density, and used that as the response. We then fit a random forest model using the quantregForest package (-@Meinshausen2012).

# set the density offset (to accommodate 0z)
dens_offset = 0.005

# fit random forest models
set.seed(4)
qrf_mods = qrf_mod_df %>%
  split(list(.$Species)) %>%
  map(.f = function(z) {

    covars = sel_hab_mets %>%
      filter(Species == unique(z$Species)) %>%
      pull(Metric)

    set.seed(3)
    qrf_mod = quantregForest(x = z %>%
                               select(one_of(covars)) %>%
                               as.matrix,
                             y = z %>%
                               mutate_at(vars(fish_dens),
                                         list(~ log(. + dens_offset))) %>%
                               select(fish_dens) %>%
                               as.matrix(),
                             keep.inbag = T,
                             ntree = 1000)

    return(qrf_mod)
  })

The relative importance of each covariate for each species' model is shown below.

rel_imp_p = qrf_mods %>%
  map(.f = function(x) {
    as_tibble(x$importance,
              rownames = 'Metric') %>%
      mutate(relImp = IncNodePurity / max(IncNodePurity)) %>%
      left_join(hab_dict %>%
                  select(Metric = ShortName,
                         Name)) %>%
      mutate_at(vars(Metric, Name),
                list(~ fct_reorder(., relImp))) %>%
      arrange(Metric) %>%
      distinct() %>%
      ggplot(aes(x = Name,
                 y = relImp)) +
      geom_col(fill = 'gray40') +
      coord_flip() +
      labs(x = 'Metric',
           y = 'Relative Importance')

  })

ggpubr::ggarrange(plotlist = rel_imp_p,
                  nrow = 1,
                  ncol = 2,
                  labels = names(qrf_mods))

We then created partial dependence plots for each covariate, which show the marginal effect of each covariate on predicted capacities. These plots are made by holding all other metrics fixed at their average value, and allow the one metric to vary across it's range, while making predictions using the random forest model at many points along that gradient. We used the 90th quantile of the predictions at each point as a proxy for carrying capacity

# for Chinook
chnk_pdp = plot_partial_dependence(qrf_mods[[1]],
                                   qrf_mod_df %>%
                                     filter(Species == 'Chinook'),
                                   data_dict = hab_dict,
                                   scales = 'free_x')

# for steelhead
sthd_pdp = plot_partial_dependence(qrf_mods[[2]],
                                   qrf_mod_df %>%
                                     filter(Species == 'Steelhead'),
                                   data_dict = hab_dict,
                                   scales = 'free_x')

chnk_pdp

sthd_pdp

Predict at Habitat Sites

Methods

Direct estimates of capacity can be made at all the sites for which we have habitat data with the selected QRF covariates. At each site, the random forest model produces a number (e.g. 1000) of predictions of fish density based on those covariates. As a proxy for carrying capacity, we choose to use the 90th quantile of those predictions (@Cade2003).

Results

# what quantile is a proxy for capacity?
pred_quant = 0.9

wtsd = "Lemhi"
wtsd = 'Upper Grande Ronde'
wtsd = c('Upper Grande Ronde', 'Minam')
# wtsd = 'John Day'
qrf_mod_df %>%
  filter(Species == 'Chinook',
         Watershed %in% wtsd) %>%
  left_join(champ_dash %>%
              filter(Watershed %in% wtsd) %>%
              group_by(Site, StreamName) %>%
              summarise_at(vars(Lgth_Wet, Area_Wet),
                           list(mean))) %>%
  mutate(pred_cap = predict(qrf_mods[[1]],
                            newdata = select(., one_of(unique(sel_hab_mets$Metric))),
                            what = pred_quant)) %>%
  mutate(pred_cap = exp(pred_cap) - dens_offset,
         pred_cap = pred_cap * Lgth_Wet / Area_Wet) %>%
  st_as_sf(coords = c('LON_DD', 'LAT_DD'),
           crs = 4326) %>%
  st_transform(crs = 5070) %>%
  ggplot() +
  geom_sf(aes(color = pred_cap)) +
  scale_color_viridis_c(direction = -1) +
  theme(axis.text = element_blank()) +
  labs(color = expression(Parr / m^2),
       title = paste(wtsd, collapse = ' '))

Extrapolation to Other Sites

To estimate watershed capacities, we used QRF predictions at all the sites where we had habitat data, and used those as the response variable in an extrapolation model. The covariates for this model are globally available attributes (GAAs) which are attached to every master sample point.

# what quantile is a proxy for capacity?
pred_quant = 0.9

covars = unique(sel_hab_mets$Metric)
hab_impute = hab_avg %>%
  mutate_at(vars(Watershed, Channel_Type),
            list(fct_drop)) %>%
  impute_missing_data(data = .,
                      covars = covars,
                      impute_vars = c('Watershed', 
                                      'Elev_M', 
                                      'Channel_Type', 
                                      'CUMDRAINAG'),
                      method = 'missForest') %>%
  select(Site, Watershed, LON_DD, LAT_DD, VisitYear, Lgth_Wet, Area_Wet, one_of(covars))

pred_hab_sites = hab_impute %>%
  mutate(chnk_per_m = predict(qrf_mods[[1]],
                              newdata = select(., one_of(unique(sel_hab_mets$Metric))),
                              what = pred_quant),
         chnk_per_m = exp(chnk_per_m) - dens_offset,
         chnk_per_m2 = chnk_per_m * Lgth_Wet / Area_Wet)

# filter out sites outside the Chinook domain
data("chnk_domain")
snap_dist = 1000

pred_hab_sites_chnk = pred_hab_sites %>%
  filter(!is.na(LON_DD)) %>%
  st_as_sf(coords = c('LON_DD', 'LAT_DD'),
           crs = 4326) %>%
  st_transform(crs = st_crs(chnk_domain)) %>%
  as_Spatial() %>%
  maptools::snapPointsToLines(chnk_domain %>%
                                mutate(id = 1:n()) %>%
                                select(id, MPG) %>%
                                as_Spatial(),
                              maxDist = snap_dist,
                              withAttrs = T,
                              idField = 'id') %>%
  as('sf') %>%
  as_tibble() %>%
  select(-nearest_line_id, -snap_dist, -geometry) %>%
  # add sites in the John Day
  bind_rows(st_read('../data/raw/domain/Chnk_JohnDay_TrueObs.shp',
                    quiet = T) %>%
              as_tibble() %>%
              select(Site) %>%
              inner_join(pred_hab_sites))

Because most of our habitat data came from CHaMP sites, which were selected through a generalized random tesselation stratified (GRTS) sample design [@Stevens2003], we wanted to incorporate the design weights for each CHaMP site with a predicted carrying capacity.

#----------------------------------------
# pull in survey design related data
#----------------------------------------
# Calculate GRTS design weights.

# pull in info about what strata each CHaMP site was assigned to (using 2014 as reference year)
site_strata = pred_hab_sites_chnk %>%
  select(Site, Watershed) %>%
  distinct() %>%
  left_join(gaa %>%
              select(Site,
                     # CHaMPsheds,
                     strata = AStrat2014)) %>%
  mutate(site_num = str_split(Site, '-', simplify = T)[,2]) %>%
  mutate(strata = if_else(Watershed == 'Asotin',
                          site_num,
                          if_else(Watershed == 'Entiat' & grepl('ENT00001', Site),
                                  paste('EntiatIMW', site_num, sep = '_'),
                                  strata))) %>%
  mutate(strata = if_else(grepl('EntiatIMW', strata),
                          str_remove(strata, '[[:digit:]]$'),
                          strata),
         strata = if_else(grepl('EntiatIMW', strata),
                          str_remove(strata, '[[:digit:]]$'),
                          strata)) %>%
  filter(!is.na(strata)) %>%
  mutate(strata = paste(Watershed, strata, sep = '_')) %>%
  select(-site_num)

# read in data from the CHaMP frame
champ_frame_df = read_csv('../data/prepped/champ_frame_data.csv') %>%
  mutate(Target2014 = ifelse(is.na(AStrat2014), 'Non-Target', Target2014)) %>%
  mutate(AStrat2014 = ifelse(AStrat2014 == 'Entiat IMW', paste('EntiatIMW', GeoRchIMW, sep = '_'), AStrat2014)) %>%
  filter(Target2014 == 'Target') %>%
  rename(Watershed = CHaMPshed)

# what strata do we have?
frame_strata = champ_frame_df %>%
  mutate(strata = paste(Watershed, AStrat2014, sep='_')) %>%
  select(Watershed, 
         strata) %>%
  distinct()

# how long is each strata?
chnk_strata_length = champ_frame_df %>%
  filter(!is.na(UseTypCHSP)) %>%
  mutate(strata = paste(Watershed, AStrat2014, sep='_')) %>%
  select(Watershed, matches("Strat"), FrameLeng) %>%
  group_by(Watershed, strata) %>%
  summarise(tot_length_km = sum(FrameLeng) / 1000) %>%
  ungroup() %>%
  mutate_at(vars(Watershed, strata), 
            list(as.factor)) %>%
  arrange(Watershed, strata)

sthd_strata_length = champ_frame_df %>%
  filter(!is.na(UseTypSTSU)) %>%
  mutate(strata = paste(Watershed, AStrat2014, sep='_')) %>%
  select(Watershed, matches("Strat"), FrameLeng) %>%
  group_by(Watershed, strata) %>%
  summarise(tot_length_km = sum(FrameLeng) / 1000) %>%
  bind_rows(tibble(Watershed = 'Asotin',
                   strata = paste('Asotin', c('CC', 'NF', 'SF'), sep = '_'),
                   tot_length_km = 12)) %>%
  ungroup() %>%
  mutate_at(vars(Watershed, strata), 
            list(as.factor)) %>%
  arrange(Watershed, strata)


# how many sites in each strata? and what is the length of each strata?
strata_tab = pred_hab_sites_chnk %>%
  select(Site, Watershed, matches('per_m')) %>%
  left_join(site_strata) %>%
  filter(strata != 'Entiat_Entiat IMW') %>%
  mutate_at(vars(Watershed),
            list(fct_drop)) %>%
  group_by(Watershed, strata) %>%
  summarise(n_sites = n_distinct(Site)) %>%
  ungroup() %>%
  full_join(chnk_strata_length) %>%
  mutate(n_sites = if_else(is.na(n_sites),
                           as.integer(0),
                           n_sites)) %>%
  # calculate the weight of each site in each strata
  mutate(site_weight = if_else(n_sites > 0,
                               tot_length_km / n_sites,
                               as.numeric(NA)))


strata_test = frame_strata %>%
  full_join(strata_tab) %>%
  mutate_at(vars(Watershed),
            list(fct_drop)) %>%
  mutate(n_sites = if_else(is.na(n_sites),
                           as.integer(0),
                           n_sites))

# # what frame strata don't have any sites in them?
# strata_test %>%
#   filter(n_sites == 0,
#          !is.na(tot_length_km)) %>%
#   arrange(Watershed, strata) %>%
#   as.data.frame()

# # what strata that we have sites for are not in the frame strata?
# strata_test %>%
#   filter(n_sites > 0,
#          (is.na(tot_length_km) |
#             tot_length_km == 0)) %>%
#   as.data.frame()

# how much of each watershed is not accounted for with current sites / strata?
strata_test %>%
  group_by(Watershed) %>%
  summarise_at(vars(tot_length_km),
               list(sum),
               na.rm = T) %>%
  left_join(strata_test %>%
              filter(n_sites == 0) %>%
              group_by(Watershed) %>%
              summarise_at(vars(missing_length = tot_length_km),
                           list(sum),
                           na.rm = T)) %>%
  mutate_at(vars(missing_length),
            list(~ if_else(is.na(.), 0, .))) %>%
  mutate(perc_missing = missing_length / tot_length_km) %>%
  mutate_at(vars(perc_missing),
            list(~ if_else(is.na(.), 0, .))) %>%
  arrange(desc(perc_missing)) %>%
  pander()

We then selected a subset of the globally available attributes (GAAs) to use as extrapolation covariates. This was based on minimizing the correlations between the covariates. We also acknowlegdged which prediction sites had GAAs outside the range of values for the CHaMP sites.

We use the log of predicted capacity wherever we could make those predictions as the response variable, and this group of selected GAAs as the covariates. We then fit a linear model that incorporated the survey design weights by using the survey pacakge (-@Lumley2014) in R. We fit two main models, one that included CHaMP watershed as a covariate, and one that did not. The former was used to make predictions within all CHaMP watersheds where we had QRF predictions of carrying capacity. The latter was used at all other master sample points.

data(gaa)

gaa_all = gaa %>%
  rename(Channel_Type = ChanlType)

# possible covariates from among the GAAs
gaa_covars = c('TRange', 'GDD', 'Precip', 'Elev_M', 'CHaMPsheds', 'NatPrin1', 'NatPrin2', 'DistPrin1', 'BFW_M', 'SrtCumDrn', 'StrmPwr', 'Slp_NHD_v1', 'Channel_Type', 'MAVELV', 'WIDE_BF', 'S2_02_11')

# what type of covariate is each GAA?
gaa_class = gaa_all %>%
  select(one_of(gaa_covars)) %>%
  as.list() %>%
  map_chr(.f = class)

# which ones are numeric?
gaa_num = names(gaa_class)[gaa_class %in% c('integer', 'numeric')]
# which ones are categorical?
gaa_catg = names(gaa_class)[gaa_class %in% c('factor', 'character')]

# compare range of covariates from model dataset and prediction dataset
range_comp = bind_rows(gaa_all %>%
                         filter(!Site %in% unique(pred_hab_sites$Site)) %>%
                         select(Site,
                                one_of(gaa_num)) %>%
                         gather(Metric, value, -Site) %>%
                         mutate(Source = 'non-DASH Sites'),
                       gaa_all %>%
                         filter(Site %in% unique(pred_hab_sites$Site)) %>%
                         select(Site,
                                one_of(gaa_num)) %>%
                         distinct() %>%
                         gather(Metric, value, -Site) %>%
                         mutate(Source = 'DASH Sites')) %>%
  mutate_at(vars(Source, Metric),
            list(as.factor))

range_max = range_comp %>%
  group_by(Metric, Source) %>%
  summarise_at(vars(value),
               tibble::lst(min, max),
               na.rm = T) %>%
  filter(Source == 'DASH Sites') %>%
  ungroup() %>%
  gather(type, value, -Metric, -Source)

covar_range_p = range_comp %>%
  ggplot(aes(x = Source,
             y = value,
             fill = Source)) +
  geom_boxplot() +
  facet_wrap(~ Metric,
             scales = 'free') +
  geom_hline(data = range_comp %>%
               group_by(Metric, Source) %>%
               summarise_at(vars(value),
                            tibble::lst(min, max),
                            na.rm = T) %>%
               filter(Source == 'DASH Sites') %>%
               ungroup() %>%
               gather(type, value, min, max),
             aes(yintercept = value),
             lty = 2,
             color = 'darkgray') +
  theme_minimal()

covar_range_p

# look at correlations among GAAs
library(corrr)
corr_df = gaa_all %>%
  select(one_of(gaa_num)) %>%
  correlate()

corr_df %>%
  shave(upper = T) %>%
  stretch() %>%
  arrange(desc(abs(r))) %>%
  filter(abs(r) >= 0.5)

gaa_corr_p1 = corr_df %>%
  shave(upper = T) %>%
  rplot(legend = T,
        colors = c('blue', 'white', 'indianred2'),
        print_cor = T)

corr_mat = corr_df %>%
  select(-1) %>%
  as.matrix() 
rownames(corr_mat) = corr_df$rowname

library(ggcorrplot)
gaa_corr_p2 = ggcorrplot(corr_mat,
           lab = T,
           lab_size = 2,
           hc.order = T,
           colors = c('#2166AC', 'white', '#B2182B'),
           tl.cex = 8,
           type = 'full')

# reduce number of covariates bases on correlations
gaa_covars = c('TRange', 
               # 'GDD', 
               # 'Precip',
               'Elev_M', 
               'CHaMPsheds', 
               'NatPrin1', 
               # 'NatPrin2', 
               'DistPrin1', 
               # 'BFW_M', 
               'SrtCumDrn', 
               'StrmPwr',
               'Slp_NHD_v1', 
               'Channel_Type', 
               # 'MAVELV', 
               'WIDE_BF',
               'S2_02_11')

# what type of covariate is each GAA?
gaa_class = gaa_all %>%
  select(one_of(gaa_covars)) %>%
  as.list() %>%
  map_chr(.f = class)

# which ones are numeric?
gaa_num = names(gaa_class)[gaa_class %in% c('integer', 'numeric')]
# which ones are categorical?
gaa_catg = names(gaa_class)[gaa_class %in% c('factor', 'character')]

# Center the covariates
# filter out sites with covariates outside range of covariates used to fit extrapolation model
out_range_sites = gaa_all %>%
  # filter out a few areas
  filter(!HUC6NmNRCS %in% c('Upper Sacramento', 'Southern Oregon Coastal', 'Puget Sound', 'Northern California Coastal', 'Oregon Closed Basins')) %>%
  # don't use AEM sites in model
  filter(!grepl('^AEM', Site)) %>%
  select(one_of(gaa_num), Site) %>%
  gather(Metric, value, -Site) %>%
  left_join(select(range_max, -Source) %>%
              spread(type, value)) %>%
  group_by(Metric) %>%
  filter(value > max |
           value < min) %>%
  ungroup() %>%
  pull(Site) %>%
  unique()

# center covariates
gaa_summ = inner_join(pred_hab_sites %>%
                          select(Site) %>%
                          distinct(),
                        gaa_all %>%
                          select(Site, one_of(gaa_num))) %>%
  gather(GAA, value, -Site) %>%
  group_by(GAA) %>%
  summarise(metric_mean = mean(value, na.rm=T),
            metric_sd = sd(value, na.rm=T)) %>%
  ungroup()

# extrapolation model data set, with normalized covariates
mod_data = inner_join(pred_hab_sites_chnk %>%
                        select(Site:avg_aug_temp,
                               matches('per_m')),
                      gaa_all %>%
                        select(Site, one_of(gaa_num))) %>%
  gather(GAA, value, one_of(gaa_num)) %>%
  left_join(gaa_summ) %>%
  mutate(norm_value = (value - metric_mean) / metric_sd) %>%
  select(-(value:metric_sd)) %>%
  spread(GAA, norm_value) %>%
  left_join(gaa_all %>%
              select(Site, one_of(gaa_catg)))

mod_data %<>%
  bind_cols(mod_data %>%
              is.na() %>%
              as_tibble() %>%
              select(one_of(gaa_covars)) %>%
              transmute(n_NA = rowSums(.))) %>%
  filter(n_NA == 0) %>%
  mutate_at(vars(Watershed, CHaMPsheds, Channel_Type),
            list(fct_drop)) 


# where do we want to make extrapolation predictions?
gaa_pred = gaa_all %>%
  # filter out a few areas
  filter(!HUC6NmNRCS %in% c('Upper Sacramento', 'Southern Oregon Coastal', 'Puget Sound', 'Northern California Coastal', 'Oregon Closed Basins')) %>%
  # don't use AEM sites in model
  filter(!grepl('^AEM', Site)) %>%
  # don't use non-GRTS sites
  filter(SiteID_alt != 'NonGRTSSite' | is.na(SiteID_alt)) %>%
  filter(!grepl('mega', Site, ignore.case = T)) %>%
  # note which sites have GAAs outside range of CHaMP sites GAAs
  mutate(inCovarRange = ifelse(Site %in% out_range_sites, F, T)) %>%
  select(Site, one_of(gaa_covars), Lon, Lat, inCovarRange, HUC6NmNRCS, HUC8NmNRCS, HUC10NmNRC, HUC12NmNRC, chnk) %>%
  gather(GAA, value, one_of(gaa_num)) %>%
  left_join(gaa_summ) %>%
  mutate(norm_value = (value - metric_mean) / metric_sd) %>%
  select(-(value:metric_sd)) %>%
  spread(GAA, norm_value)

# calculate adjusted weights for all predicted QRF capacity sites
mod_data_weights = mod_data %>%
  left_join(site_strata) %>%
  left_join(strata_tab) %>%
  filter(!is.na(site_weight)) %>%
  group_by(Watershed) %>%
  mutate(sum_weights = sum(site_weight)) %>%
  ungroup() %>%
  mutate(adj_weight = site_weight / sum_weights)

#-------------------------------------------------------------
# Set up the survey design.

# getOption('survey.lonely.psu')
# this will prevent strata with only 1 site from contributing to the variance
# options(survey.lonely.psu = 'certainty')
# this centers strata with only 1 site to the sample grand mean; this is conservative
options(survey.lonely.psu = 'adjust')

# extrapolation model formula
full_form = as.formula(paste('log(qrf_cap) ~ -1 + (', paste(gaa_covars, collapse = ' + '), ')'))

# fit various models
model_svy_df = mod_data_weights %>%
  gather(response, qrf_cap, matches('per_m')) %>%
  select(-(n_sites:sum_weights)) %>%
  group_by(response) %>%
  nest() %>%
  mutate(design = map(data,
                      .f = function(x) {
                        svydesign(id = ~ 1,
                                  data = x,
                                  strata = ~ Watershed,
                                  # strata = ~ strata,
                                  weights = ~ adj_weight)
                      })) %>%
  mutate(mod_full = map(design,
                        .f = function(x) {
                          svyglm(full_form,
                                 design = x)
                        }),
         mod_no_champ = map(design,
                            .f = function(x) {
                              svyglm(update(full_form, .~ . -CHaMPsheds),
                                     design = x)
                            }))

# predictions within CHaMP watersheds, using the model that includes CHaMPsheds as a covariate
y = gaa_pred %>%
  filter(CHaMPsheds %in% unique(model_svy_df$data[[1]]$Watershed)) %>%
  select(Site, CHaMPsheds, one_of(gaa_covars)) %>%
  na.omit() %>%
  left_join(gaa_pred) %>%
  mutate_at(vars(Channel_Type),
            list(as.factor))

per_m_preds = predict(model_svy_df$mod_full[[1]],
                      newdata = y,
                      se = T,
                      type = 'response')

per_m2_preds = predict(model_svy_df$mod_full[[2]],
                      newdata = y,
                      se = T,
                      type = 'response')

y %<>%
  bind_cols(per_m_preds %>%
              as_tibble() %>%
              rename(chnk_per_m = response,
                     chnk_per_m_se = SE)) %>%
  bind_cols(per_m2_preds %>%
              as_tibble() %>%
              rename(chnk_per_m2 = response,
                     chnk_per_m2_se = SE)) %>%
  mutate_at(vars(matches('per_m')),
            list(exp))

rm(per_m_preds, per_m2_preds)

# predictions at all points, using the model without CHaMPsheds as a covariate
z = gaa_pred %>%
  select(Site, one_of(gaa_covars), -CHaMPsheds) %>%
  na.omit() %>%
  left_join(gaa_pred) %>%
  mutate_at(vars(Channel_Type),
            list(as.factor))

per_m_preds = predict(model_svy_df$mod_no_champ[[1]],
                      newdata = z,
                      se = T,
                      type = 'response')

per_m2_preds = predict(model_svy_df$mod_no_champ[[2]],
                      newdata = z,
                      se = T,
                      type = 'response')

z %<>%
  bind_cols(per_m_preds %>%
              as_tibble() %>%
              rename(chnk_per_m = response,
                     chnk_per_m_se = SE)) %>%
  bind_cols(per_m2_preds %>%
              as_tibble() %>%
              rename(chnk_per_m2 = response,
                     chnk_per_m2_se = SE)) %>%
  mutate_at(vars(matches('per_m')),
            list(exp))

rm(per_m_preds, per_m2_preds)

all_preds = y %>%
  mutate(model = 'CHaMP') %>%
  bind_rows(z %>%
              mutate(model = 'non-CHaMP'))

# quick comparison of capacity predicitons with both models
comp_pred_p = all_preds %>%
  filter(Site %in% Site[duplicated(Site)]) %>%
  select(Site, CHaMPsheds, Channel_Type, model, chnk_per_m, chnk_per_m2) %>%
  arrange(CHaMPsheds, Site, model) %>%
  gather(dens_type, cap, starts_with('chnk_per_m')) %>%
  spread(model, cap) %>%
  ggplot(aes(x = CHaMP,
             y = `non-CHaMP`)) +
  geom_point() +
  geom_abline(linetype = 2,
              color = 'red') +
  facet_wrap(~ CHaMPsheds + dens_type,
             scales = 'free')

# for sites in CHaMP watersheds, use predicions from CHaMP extrapolation model
all_preds %<>%
  filter((Site %in% Site[duplicated(Site)] & model == 'CHaMP') |
           !Site %in% Site[duplicated(Site)])

# extrapolation model formula
full_form = as.formula(paste('log(qrf_cap) ~ -1 + (', paste(gaa_covars, collapse = ' + '), ')'))
# extrap_mods = mod_data_list %>%
#   map(.f = function(x) {
#     mod_full = lm(full_form,
#                   data = x,
#                   na.action = na.fail)
#     
#     dd = dredge(mod_full)
#     mod_best = get.models(dd,
#                           subset = delta == 0)[[1]]
#     mod_avg = model.avg(dd,
#                         subset = cumsum(weight) < 0.96)
#   })

model_df = mod_data %>%
  filter(n_NA == 0) %>%
  gather(response, qrf_cap, matches('per_m')) %>%
  group_by(response) %>%
  nest() %>%
  mutate(mod_full = map(data,
                        .f = function(x) {
                          lm(full_form,
                             data = x)
                        }),
         mod_no_champ = map(data,
                        .f = function(x) {
                          lm(update(full_form, .~ . -CHaMPsheds),
                             data = x)
                        }))


# model_df = model_svy_df


# predictions within CHaMP watersheds, using the model that includes CHaMPsheds as a covariate
y = gaa_pred %>%
  filter(CHaMPsheds %in% unique(model_df$data[[1]]$Watershed)) %>%
  select(Site, CHaMPsheds, one_of(gaa_covars)) %>%
  na.omit() %>%
  left_join(gaa_pred) %>%
  mutate_at(vars(Channel_Type),
            list(as.factor))

per_m_preds = predict(model_df$mod_full[[1]],
                      newdata = y,
                      se.fit = T,
                      interval = 'confidence',
                      type = 'response')

per_m2_preds = predict(model_df$mod_full[[2]],
                      newdata = y,
                      se.fit = T,
                      interval = 'confidence',
                      type = 'response')

y %<>%
  bind_cols(per_m_preds$fit %>%
      as_tibble() %>%
      mutate(chnk_per_m_se = per_m_preds$se.fit) %>%
      select(chnk_per_m = fit,
             chnk_per_m_se,
             chnk_per_m_lwr = lwr,
             chnk_per_m_upr = upr)) %>%
  bind_cols(per_m2_preds$fit %>%
              as_tibble() %>%
              mutate(chnk_per_m2_se = per_m2_preds$se.fit) %>%
              select(chnk_per_m2 = fit,
                     chnk_per_m2_se,
                     chnk_per_m2_lwr = lwr,
                     chnk_per_m2_upr = upr)) %>%
  mutate_at(vars(matches('per_m')),
            list(exp))

# predictions at all points, using the model without CHaMPsheds as a covariate
z = gaa_pred %>%
  select(Site, one_of(gaa_covars), -CHaMPsheds) %>%
  na.omit() %>%
  left_join(gaa_pred) %>%
  mutate_at(vars(Channel_Type),
            list(as.factor))

per_m_preds = predict(model_df$mod_no_champ[[1]],
                      newdata = z,
                      se.fit = T,
                      interval = 'confidence',
                      type = 'response')

per_m2_preds = predict(model_df$mod_no_champ[[2]],
                      newdata = z,
                      se.fit = T,
                      interval = 'confidence',
                      type = 'response')

z %<>%
  bind_cols(per_m_preds$fit %>%
      as_tibble() %>%
      mutate(chnk_per_m_se = per_m_preds$se.fit) %>%
      select(chnk_per_m = fit,
             chnk_per_m_se,
             chnk_per_m_lwr = lwr,
             chnk_per_m_upr = upr)) %>%
  bind_cols(per_m2_preds$fit %>%
              as_tibble() %>%
              mutate(chnk_per_m2_se = per_m2_preds$se.fit) %>%
              select(chnk_per_m2 = fit,
                     chnk_per_m2_se,
                     chnk_per_m2_lwr = lwr,
                     chnk_per_m2_upr = upr)) %>%
  mutate_at(vars(matches('per_m')),
            list(exp))


all_preds = y %>%
  mutate(model = 'CHaMP') %>%
  bind_rows(z %>%
              mutate(model = 'non-CHaMP'))

# quick comparison of capacity predicitons with both models
comp_pred_p = all_preds %>%
  filter(Site %in% Site[duplicated(Site)]) %>%
  select(Site, CHaMPsheds, Channel_Type, model, chnk_per_m, chnk_per_m2) %>%
  arrange(CHaMPsheds, Site, model) %>%
  gather(dens_type, cap, starts_with('chnk_per_m')) %>%
  spread(model, cap) %>%
  ggplot(aes(x = CHaMP,
             y = `non-CHaMP`)) +
  geom_abline(linetype = 2,
              color = 'red') +
  geom_point() +
  facet_wrap(~ CHaMPsheds,
             scales = 'free')

# for sites in CHaMP watersheds, use predicions from CHaMP extrapolation model
all_preds %<>%
  filter((Site %in% Site[duplicated(Site)] & model == 'CHaMP') |
           !Site %in% Site[duplicated(Site)])

wtsd = 'Lemhi'
wtsd = 'South Fork Salmon'
wtsd = 'Lower Snake-Tucannon'
wtsd = 'Wenatchee'
wtsd = 'Methow'

y %>%
  filter(HUC8NmNRCS %in% wtsd) %>%
  filter(chnk == 'Yes') %>%
  st_as_sf(coords = c('Lon', 'Lat'),
           crs = 4326) %>%
  st_transform(crs = 5070) %>%
  ggplot() +
  geom_sf(aes(color = chnk_per_m2)) +
  scale_color_viridis_c(direction = -1) +
  theme(axis.text = element_blank()) +
  labs(color = expression(Parr / m^2),
       title = paste(wtsd, collapse = ' '))

Validation

We can validate Chinook summer parr estimates of capacity with spawner-recruit data from a number of watersheds. Morgan Bond at the Northwest Fisheries Science Center helped compile this data set with help from a variety of agencies and organizations including WDFW, ODFW, IDFG, USFS, BioAnalysts, Biomark, Nez Perce, etc. Most of the parr (recruits) data comes from rotary screw traps (with the exception of the Chiwawa, where snorkel surveys of parr were used to estimate parr directly). Estimates of parr-to-presmolt and parr-to-smolt survival were used to transform estimates of fall and spring emigrants to estimates of equivalent summer parr. Spawner estimates came from either redd counts or direct estimates of spawners.

From this dataset, we fit several spawner-recruit curves to each location, including Beverton-Holt, Ricker and hockey stick. The data at some locations led to some curves not fitting (no model convergence). Each of these spawner-recruit curves also generated an estimate of summer parr carrying capacity, which we can then compare to QRF estimates. We want to emphasize that the data for the spawner-recruit curves and the QRF estimates are completely independent, and therefore provide a good way to validate the QRF estimates of carrying capacity.

data("chnk_domain")
# read in trap polygons
trap_poly = read_sf('../data/raw/spawn_rec/trap_regions.shp') %>%
  # st_transform(crs = 5070)
  st_transform(st_crs(chnk_domain)) %>%
  select(-Id)

capacity_pts = all_preds %>%
  st_as_sf(coords = c('Lon', 'Lat'),
           crs = 4326)

qrf_caps = trap_poly %>%
  split(list(.$trap_name)) %>%
  map_df(.id = 'Trap',
         .f = calc_watershed_cap,
         spp_range = chnk_domain,
         capacity_pts = capacity_pts,
         capacity_name = "chnk_per_m",
         capacity_se_name = 'chnk_per_m_se') %>%
  mutate(tot_cap_cv = tot_cap_se / tot_cap) %>%
  mutate(Trap = recode(Trap,
                       'Minam R.' = 'Minam River',
                       'Entiat R.' = 'Entiat River',
                       'Crooked Fork Creek' = 'Crooked Fork',
                       'Upper Grande Ronde R.' ='Upper Grande Ronde River',
                       'Toucannon R.' = 'Tucannon River',
                       'Pahsimeroi' = 'Pahsimeroi River',
                       'Hood R.' = 'Hood River',
                       'John Day Middle Fork' = 'Middle Fork John Day',
                       'Upper Salmon River' = 'Upper Salmon River Chinook',
                       'John Day Upper Mainstem' = 'Upper Mainstem John Day',
                       'East Fork Salmon' = 'East Fork Salmon River',
                       'Lostine R.' = 'Lostine River'))

# qrf_caps %>%
#   filter(is.na(tot_cap))

We can estimate a Beverton-Holt spawner-recruit curve with the capacity parameter fixed to the QRF estimate of capacity for that watershed.

library(FSA)
bh1 <- srFuns('BevertonHolt',
              param = 3)

data("spawn_rec_data")
# fit Beverton-Holt curve with capacity fixed at QRF capacity prediction
qrf_params = spawn_recr_data %>%
  group_by(Population) %>%
  nest() %>%
  left_join(qrf_caps,
            by = c('Population' = 'Trap')) %>%
  filter(!is.na(tot_cap)) %>%
  mutate(form = 'QRF',
         inits = map2(.x = data,
                      .y = tot_cap,
                     .f = function(x, y) {
                       inits = srStarts(Parr ~ Spawners,
                                        data = x,
                                        type = 'BevertonHolt',
                                        param = 3)

                       inits$a = if_else(inits$a < 0, 
                                         2e-3, 
                                         inits$a)
                       inits$b = 1 / y
                       return(inits)
                       }),
         mod_fit = map2(.x = data,
                        .y = inits,
                        .f = function(x, y) {
                          fit = try(nls(log(Parr) ~ log(bh1(Spawners, a, b)),
                                        data = x,
                                        start = y,
                                        algorithm = 'port',
                                        lower = c(0, y['b']),
                                        upper = c(Inf, y['b'])))
                        }),
         coefs = map(mod_fit,
                     .f = function(x) {
                       if(class(x) == 'try-error') {
                         return(as.numeric(NA))
                       } else coef(x)
                       }),
         preds = map2(.x = data,
                      .y = mod_fit,
                      .f = function(x, y) {
                        if(class(y) != 'try-error') {
                          tibble(Spawners = 1:max(x$Spawners)) %>%
                            mutate(Parr = predict(y,
                                                  newdata = .),
                                   Parr = exp(Parr))
                        } else return(tibble(Spawners = NA,
                                             Parr = NA))
                      })) %>%
  rename(est = tot_cap,
         se = tot_cap_se) %>%
  select(-c(area:tot_length, tot_cap_cv)) %>%
  mutate(cv = se / est)

data("spawn_rec_params")

spawn_rec_params %<>%
  bind_rows(qrf_params)

# pull out estimates of capacity
cap_tbl = spawn_rec_params %>%
  select(Population, form, est, se, cv) %>%
  arrange(Population, form)

# make a few estimates NA (too big)
cap_tbl %<>%
  mutate_at(vars(est, se, cv),
            list(~ if_else(Population %in% c('Entiat River',
                                             'Marsh Creek',
                                             'Minam River',
                                             'Tucannon River') &
                             form == 'BevertonHolt',
                           as.numeric(NA), .)))


# pull out fitted spawner recruit curves
curve_fits = spawn_rec_params %>%
  select(Population, form, preds) %>%
  unnest(cols = "preds") %>%
  arrange(Population, form)

# define polygons for uncertainty shading
plot_max = Inf
plot_max = 5e5
cap_poly = cap_tbl %>%
  mutate(lwrCI = est + se * qnorm(0.025),
         uprCI = est + se * qnorm(0.975)) %>%
  mutate(lwrCI = if_else(lwrCI < 0, 0, lwrCI),
         uprCI = if_else(uprCI > plot_max, plot_max, uprCI)) %>%
  left_join(spawn_rec_params %>%
              select(Population, data) %>%
              unnest(cols = "data") %>%
              group_by(Population) %>%
              summarise_at(vars(Spawners, Parr),
                           list(min = min,
                                max = max),
                           na.rm = T)) %>%
  # select(Population, form, ends_with('min'), ends_with('max'), ends_with('CI'))
  group_by(Population, form) %>%
  summarise(coords = list(tibble(x = c(0, 0, Spawners_max, Spawners_max),
                                 y = c(lwrCI, uprCI, uprCI, lwrCI)))) %>%
  ungroup() %>%
  unnest()

# which watersheds to plot?
keep_wtsds = qrf_caps %>%
  filter(!is.na(tot_cap)) %>%
  filter(!Trap %in% c('Methow R.')) %>%
  pull(Trap)

spawn_rec_comp_p = spawn_rec_params %>%
  # filter out a couple populations with super poor fits
  filter(Population %in% keep_wtsds) %>%
  select(Population, data) %>%
  unnest(cols = "data") %>%
  distinct() %>%
  ggplot(aes(x = Spawners,
             y = Parr)) +
  geom_polygon(data = cap_poly %>%
                 # filter out a couple populations with super poor fits
                 filter(Population %in% keep_wtsds),
              aes(x = x,
                  y = y,
                  fill = form),
              alpha = 0.2) +
  geom_line(data = curve_fits %>%
              # filter out a couple populations with super poor fits
              filter(Population %in% keep_wtsds),
            aes(color = form),
            lwd = 1.5) +
  geom_hline(data = cap_tbl %>%
               # filter out a couple populations with super poor fits
               filter(Population %in% keep_wtsds),
             aes(yintercept = est,
                 color = form),
             linetype = 2) +
  geom_point(aes(shape = Spawner_type),
             size = 3) +
  facet_wrap(~ Population,
             scales = 'free') +
  scale_color_brewer(palette = 'Set1',
                     breaks = c('QRF',
                                'BevertonHolt',
                                'Ricker',
                                'Hockey'),
                     labels = c('QRF', 
                                'Beverton Holt',
                                'Ricker',
                                'Hockey Stick'),
                     name = 'Model') +
  scale_fill_brewer(palette = 'Set1',
                    breaks = c('QRF',
                               'BevertonHolt',
                               'Ricker',
                               'Hockey'),
                    labels = c('QRF', 
                               'Beverton Holt',
                               'Ricker',
                               'Hockey Stick'),
                    name = 'Model') +
  labs(shape = 'Spawner\nType')

dodge_width = 0.5
sr_est_comp_p = cap_tbl %>%
  filter(Population %in% keep_wtsds) %>%
  mutate(lwrCI = est + qnorm(0.025) * se,
         lwrCI = if_else(lwrCI <= 0, 1, lwrCI),
         uprCI = est + qnorm(0.975) * se,
         uprCI = if_else(uprCI > 1.3e6, 1.3e6, uprCI)) %>%
  mutate(type = if_else(form == 'QRF',
                        'QRF', 'S-R')) %>%
  ggplot(aes(x = Population,
             y = est,
             color = form,
             group = form)) +
  geom_errorbar(aes(ymin = lwrCI,
                    ymax = uprCI),
                position = position_dodge(width = dodge_width),
                width = 0) +
  geom_point(aes(shape = type),
             fill = 'white',
             size = 2,
             position = position_dodge(width = dodge_width)) +
  scale_shape_manual(values = c('QRF' = 19,
                                'S-R' = 21),
                     name = 'Type') + 
  scale_color_brewer(palette = 'Set1',
                     breaks = c('QRF',
                                'BevertonHolt',
                                'Ricker',
                                'Hockey'),
                     labels = c('QRF',
                                'Beverton Holt',
                                'Ricker',
                                'Hockey Stick'),
                     name = 'Model') +
  # scale_y_continuous(trans = 'log') +
  # theme(axis.text.x = element_text(angle = 90),
  #       axis.text.y = element_blank()) +
  theme(axis.text = element_blank()) +
  facet_wrap(~ Population,
             scales = 'free') +
  labs(y = 'Capacity Estimate',
       x = NULL)

We then plotted the data, as well as the fitted curves, including the one which uses QRF estimates of capacity as one of the parameters, on the same plot, to compare the various estimates.

spawn_rec_comp_p

sr_est_comp_p

Watershed Capacity Estimates

We can use the predictions of capacity at all master sample points to estimate carrying capacity at the watershed level. To do so, we join the points to a stream layer, and estimate the average capcity (fish per meter) from all the points on each stream. We then multiply that average by the length of the stream. Finally, we can add the capacities of all the streams within a watershed to determine the overall carrying capacity.

We can also map those capacities on the landscape. For mapping, we have found that displaying the capacity in units of fish per m$^2$ is more intuitive for the viewer to interpret.

Tables

chnk_pops = st_read('../data/raw/domain/CHNK_SPSU_All.shp',
                    quiet = T) %>%
  st_transform(crs = st_crs(chnk_domain))

lem_cap = chnk_pops %>%
  filter(grepl('Lemhi', NWR_NAME)) %>%
  calc_watershed_cap(wtsd_polygon = .,
                     spp_range = chnk_domain,
                     capacity_pts = capacity_pts,
                     by_stream = T)

lem_cap %>%
  pander()

Maps

# define watershed polygon(s)
# my_poly = chnk_pops %>%
#   filter(grepl('Minam', NWR_NAME) |
#            grepl('Grande Ronde', NWR_NAME))

my_poly = chnk_pops %>%
  filter(grepl('Lemhi', NWR_NAME))

my_lines = get_flowlines(stream_order = 2,
                         wtsd_polygon = my_poly)

# grab all prediction points in watershed polygon
my_pts = all_preds %>%
  st_as_sf(coords = c('Lon', 'Lat'),
           crs = 4326) %>%
  st_transform(st_crs(my_poly)) %>%
  st_intersection(my_poly)

# clip to Chinook range
my_pts %<>%
  st_transform(st_crs(chnk_domain)) %>%
  as_Spatial() %>%
  maptools::snapPointsToLines(lines = chnk_domain %>%
                                as_Spatial,
                              maxDist = 500) %>%
  as('sf')

# make map
ggplot() +
  geom_sf(data = my_poly,
          fill = 'gray') +
  geom_sf(data = my_lines,
          color = 'blue') +
  geom_sf(data = my_pts,
          aes(color = chnk_per_m2)) +
  scale_color_viridis_c(direction = -1) +
  theme_bw() +
  theme(axis.text = element_blank()) +
  labs(color = expression(Parr / m^2))

References

KevinSee/QRFcapacity documentation built on Feb. 27, 2023, 3:57 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

KevinSee/QRFcapacity
Carrying Capacity with QRF

In KevinSee/QRFcapacity: Carrying Capacity with QRF

Introduction

Data

Selecting Covariates

Methods

Results

Fit QRF Models

Methods

Predict at Habitat Sites

Methods

Results

Extrapolation to Other Sites

Validation

Watershed Capacity Estimates

Tables

Maps

References

R Package Documentation

Browse R Packages

We want your feedback!

KevinSee/QRFcapacity Carrying Capacity with QRF

In KevinSee/QRFcapacity: Carrying Capacity with QRF

Introduction

Data

Selecting Covariates

Methods

Results

Fit QRF Models

Methods

Predict at Habitat Sites

Methods

Results

Extrapolation to Other Sites

Validation

Watershed Capacity Estimates

Tables

Maps

References

R Package Documentation

Browse R Packages

We want your feedback!

KevinSee/QRFcapacity
Carrying Capacity with QRF