SURVEY BIOMASS INDEX TRENDS {#app:survey-trend-models}

Fisheries and Oceans Canada conducts a number of fishery-independent multispecies research surveys annually or biannually. These include four synoptic bottom trawl surveys and two longline hook surveys (Figure \@ref(fig:intro-maps)). The survey areas correspond roughly with Groundfish Management areas (Figure \@ref(fig:management-map)) where fishery quotas are allocated in British Columbia waters. For the synoptic trawl surveys, the West Coast Vancouver Island survey corresponds roughly to management areas 3CD, the Queen Charlotte Sound survey corresponds roughly to areas 5AB, the Hecate Strait survey corresponds roughly to areas 5CD, and the West Coast Haida Gwaii surveys corresponds roughly to area 5E. The Hard Bottom Longline survey is split into northern and southern segments. The southern survey area corresponds roughly to management areas 3CD and 5AB while the northern survey area corresponds roughly to areas\ 5CDE.

SYNOPTIC BOTTOM TRAWL SURVEYS

DFO, together with the Canadian Groundfish Research and Conservation Society, implemented a coordinated set of bottom trawl surveys that together cover the continental shelf and upper slope of most of the BC coast. The surveys follow a random depth stratified design and use the same bottom trawl fishing gear and fishing protocols [@sinclair2003syn]. The surveys were designed to provide a synopsis of all species available to bottom trawl gear as opposed to focusing on specific species. There are a total of four synoptic (SYN) surveys: Hecate Strait (HS), West Coast Vancouver Island (WCVI), Queen Charlotte Sound (QCS), and West Coast Haida Gwaii (WCHG) (Figure \@ref(fig:intro-maps)). The Queen Charlotte Sound and West Coast Haida Gwaii surveys have been conducted on chartered commercial fishing vessels, while the Hecate Strait and West Coast Vancouver Island surveys have been conducted on the Canadian Coast Guard research trawler WE Ricker or chartered commercial fishing vessels when the WE Ricker was not available. Two of the synoptic surveys are conducted each year on an alternating basis so that each survey is conducted every two years.

HARD BOTTOM LONGLINE SURVEYS

The Pacific Halibut Management Association, in consultation with DFO, initiated a depth-stratified, random design research longline survey conducted with chartered commercial fishing vessels in 2006. These are referred to as the Hard Bottom Longline Outside surveys. The survey employs standardized longline snap gear and fishing methods and alternates annually between the northern and southern portions of BC. The survey is designed to provide catch rates of all species and biological samples of rockfish from the outside coastal waters of BC for stock assessment.

Hard Bottom Longline Inside surveys are conducted within management area 4B. These surveys were designed to provide fishery independent indices of abundance together with biological samples to improve the assessment of Yelloweye (Sebastes ruberrimus) and Quillback (S. maliger) Rockfish for the 4B management region. They began in Johnstone Strait and Discovery Passage in Pacific Fishery Management areas 12 and 13 in 2003 and 2004, and now alternate years to cover the northern (areas 12 and 13) and southern (areas 14--20, 28, 29) portions of the inside waters. These surveys also employ standardized longline snap gear and fishing methods.

INTERNATIONAL PACIFIC HALIBUT COMMISSION FISHERY INDEPENDENT SURVEY

The International Pacific Halibut Commission’s (IPHC) fishery independent setline survey is the longest times series of longline survey data in BC. It provides distribution, biomass, age, growth and maturity data that are used in the annual assessment of Pacific Halibut (Hippoglossus stenolepis). In Appendix \@ref(app:iphc-survey-index) we describe how we use data from the survey to construct a consistent index of abundance for other species over as long a time series as possible, despite the survey design changing through the years.

FURTHER DETAILS OF SURVEYS

Details on the design of the various surveys referenced in this report can be found in the following documents:

  1. Synoptic Survey, Queen Charlotte Sound (SYN QCS): @williams2018synqcs

  2. Synoptic Survey, West Coast Vancouver Island (SYN WCVI): @nottingham2017synwcvi

  3. Synoptic Survey, Hecate Strait (SYN HS): @wyeth2018synhs

  4. Synoptic Survey, West Coast Haida Gwaii (SYN WCHG): @williams2018synwchg

  5. Hard Bottom Longline Survey, Outside (HBLL OUT): @doherty2019hbllout

  6. Hard Bottom Longline Survey, Inside (HBLL INS): @lochead2007irf

  7. Hecate Strait Multispecies Assemblage Survey (MSA HS): @choromanski2004hsmsa

  8. International Pacific Halibut Commission fishery independent survey (IPHC FISS): @flemming2012iphc

if (!file.exists(here::here("report", "other-surveys.csv"))) {
  other_survey_data <- gfplot::get_other_surveys()
  readr::write_csv(other_survey_data, here::here("report", "other-surveys.csv"))
} 
f <- here::here("report", "other-surveys.csv")
other_surveys <- read.csv(f, stringsAsFactors = FALSE, strip.white = TRUE)
names(other_surveys) <- gfplot:::firstup(tolower(gsub("_", " ", names(other_surveys))))
other_surveys$X <- NULL # just in case

\clearpage

csasdown::csas_table(other_surveys, caption = "Other surveys conducted by DFO in the Pacific region that may be applicable for species-specific analyses. Within this report, these surveys are only featured in the survey-data-availability panels in the lower right of each set of figure pages.", format = "latex") %>% 
  kableExtra::column_spec(1, width = "4in") %>%
  kableExtra::column_spec(2, width = "2in")

The main biomass index trends illustrated on the figure pages represent the 'design-based' index trends that have historically been used in Pacific Biological Station groundfish stock assessments. We extracted the trawl and longline survey relative biomass index trends from GFBio, which are generated using the same approach that has been used in all recent BC groundfish stock assessment reports. The code to perform the calculations was originally written by Norm Olsen at Pacific Biological Station and is automatically applied to the available survey data to generate the indices in the GFBio database. We have included the relevant equations below for clarity. We also compare geostatistical model-based estimates of biomass index trends for the trawl surveys (Section \@ref(sec:geostatistical)).

TRAWL SURVEY DESIGN-BASED ESTIMATION {#sec:trawl-methods}

For all trawl surveys, and for a given species, we calculated the relative biomass density $B$ in year $t$ as:

\begin{equation} B_t = \sum_{i = 1}^k C_{t,i} A_i \end{equation}

where $C_{t,i}$ represents the mean CPUE in kg/km^2^ for the species in year $t$ and stratum $i$, $A_i$ represents the area of stratum $i$ in km^2^, and $k$ represents the number of strata. We calculated the CPUE ($C_{t,i}$) for a given species in year $t$ and stratum $i$ as:

\begin{equation} C_{t,i} = \frac{\sum_{j = 1}^{n_{t,i}} \left( W_{t,j} / D_{t,j} w_{t,j}\right)}{n_{t,i}} \end{equation}

where $W_{t,j}$ represents the catch weight (kg) for the species in year $t$, stratum $i$, and tow $j$; $D_{t,j}$ represents the distance travelled in km by tow $j$ in year $y$; $w_{t,j}$ represents the net opening width in km for year $y$ and tow $j$; and $n_{t,i}$ represents the number of tows in year $t$ and stratum $i$.

LONGLINE SURVEY DESIGN-BASED ESTIMATION {#sec:ll-methods}

For the HBLL surveys, and for a given species, we calculated the relative biomass density $B$ in year $t$ as:

\begin{equation} B_t = \sum_{i = 1}^k C_{t,i} A_i \end{equation}

where $C_{t,i}$ represents the mean CPUE in pieces (fish) per km\textsuperscript{2} for the species in year $t$ and stratum $i$, $A_i$ represents the area of stratum $i$ in km^2^, and $k$ represents the number of strata. We calculated the CPUE ($C_{t,i}$) for a given species in year $t$ and stratum $i$ as:

\begin{equation} C_{t,i} = \frac{\sum_{j = 1}^{n_{t,i}} \left( N_{t,j} / H_{t,j} w_{t,j}\right)}{n_{t,i}} \end{equation}

where $N_{t,j}$ represents the number of fish caught for the species in year $t$, stratum $i$, and set $j$; $H_{t,j}$ represents the number of hooks $\times$ the hook spacing in km in set $j$ in year $t$; $w_{t,j}$ represents an arbitrary swept width of 30 feet or 0.009144 km for year $t$ and tow $j$; and $n_{t,i}$ represents the number of sets in year $t$ and stratum $i$. The hook spacing is 8 feet or 0.0024384 km for the inside and outside HBLL surveys.

Details on the design-based estimation of biomass density index for the IPHC survey are shown in Appendix \@ref(app:iphc-survey-index).

DESIGN-BASED-INDEX CONFIDENCE INTERVALS

We calculated bootstrap confidence intervals on $B_t$ by repeatedly calculating $B_t$ given the above equations but each time re-sampling, with replacement, from the available tows within each stratum. We drew 1000 bootstrap replicates of $B_t$, $B_t^{\mathrm{rep}}$, and calculated 95% bias-corrected and adjusted (BCa) confidence intervals [@efron1987] on $B_t^{\mathrm{rep}}$.

GEOSTATISTICAL SPATIOTEMPORAL BIOMASS INDEX TRENDS {#sec:geostatistical}

The above-described design-based estimates assume that average fish density is the same throughout each survey stratum and that the only source of variance between samples is sampling stochasticity itself [@petitgas2001]. However, we know this is not true --- a substantial portion of fish density variation within a stratum can be attributed to habitat being of better or poorer suitability for a given fish and this suitability can be because of many factors beyond depth alone, on which the strata are stratified [@shelton2014]. We also know that fish do not perceive their habitat according to these exact strata boundaries and that ecological processes in general tend to be spatially correlated with processes closer to each other being more similar than those further apart. Design-based estimates do not take advantage of this possible spatial correlation.

Geostatistical modelling of survey data aims to address these issues by modelling fish density as a smooth spatial surface --- possibly the result of explicit habitat variables such as depth --- but also as the product of other unobserved or 'latent' spatial effects. In recent years, there has been a movement toward such 'model-based' standardization of survey biomass index trends [e.g., @shelton2014; @thorson2015; @webster2017]. We include model-based index trends for the synoptic trawl surveys as a point of comparison so the reader can gauge when the two approaches may differ. Large differences are likely a result of the random positioning of the survey sets for a given year ending up in particularly good or poor habitat for a given species. Authors of BC groundfish stock assessments may want to consider model-based index standardization on a case-by-case basis.

We use the geostatistical model as described in Appendix \@ref(app:spatial-modeling) with the addition of spatiotemporal random effects ($\epsilon_{s,t}$; defined for locations in space $s$ and time $t$) and annual predictors for the mean biomass each year:

\begin{align} y_{s,t} &\sim \mathrm{Tweedie}(\mu_{s,t}, p, \phi), \quad 1 < p < 2,\ \mu_{s,t} &= \exp \left( \bm{X}{s,t} \bm{\beta} + \omega_s + \epsilon{s,t} \right). \end{align}

As with the spatial model, the spatial random effects ($\omega_s$) are assumed to be drawn from a multivariate normal distribution with some covariance matrix $\bm{\Sigma}_\omega$:

\begin{equation} \bm{\omega} \sim \mathrm{MVNormal} \left( \bm{0}, \bm{\Sigma}_\omega \right), \end{equation}

and we assume the same for the spatiotemporal random effects, with each time slice being given its own independent set of random effects ($\bm{\epsilon}t$) with covariance matrix $\bm{\Sigma}{\epsilon,t}$:

\begin{equation} \bm{\epsilon}t \sim \mathrm{MVNormal} \left( \bm{0}, \bm{\Sigma}{\epsilon,t} \right). \end{equation}

The spatial random effects account for spatial factors that are constant across time, for example, depth and substrate type. The spatiotemporal random effects account for factors that vary from year to year spatially such as bottom temperature, water circulation patterns, species interactions, and species movement.

Here we use Pacific Cod as an example species to illustrate the model components. The approach includes the same generation of spatial 'knots' as in the spatial model described in Appendix \@ref(app:spatial-modeling) (Figure \@ref(fig:sdmTMB-spt-spde)). We use 200 knots, which for spatial coverage of the synoptic trawl surveys, seems to be of adequately high resolution to capture the spatial and spatiotemporal variation. We tested this assumption by increasing the number of knots for a selection of species and ensuring that the estimated trends did not qualitatively differ.

We can project predictions from the model to a fine-scale (2 km $\times$ 2 km) grid using the covariance projection matrix (Figures \@ref(fig:sdmTMB-spt-plot-all-effects-sqrt), \@ref(fig:sdmTMB-spt-plot-all-effects-sqrt-depth)). We can also look at the individual components of the model. For the models without fixed effect predictors for depth, the fixed effect predictions each year are constant spatially (Figure \@ref(fig:sdmTMB-spt-plot-fixed-effects)). The spatial random effects are constant across years (Figure \@ref(fig:sdmTMB-spt-plot-spatial-effects)) and the spatiotemporal random effects vary across years (Figure \@ref(fig:sdmTMB-spt-plot-spatiotemporal-effects)). We can look at residuals through space and time to check if there appears to be remaining spatial or spatiotemporal autocorrelation (Figure \@ref(fig:sdmTMB-spt-spatial-residuals)).

We can then calculate expected biomass $B_t$ in year $t$ as:

\begin{equation} B_t = \sum_{j = 1}^{n_j} w_j \cdot \exp \left( \bm{X}{j,t} \bm{\beta} + \omega_j + \epsilon{j,t} \right), \end{equation}

where $j$ references a 2 km $\times$ 2 km grid cell within the survey domain and $w_j$ represents the weight or area of that grid cell (4 km^2^) (Figure \@ref(fig:sdmTMB-plot-index)). In other words, we sum the predicted biomass across all grid cells within the survey domain for each year. We generated standard errors on the annual estimates of log biomass via the generalized Delta method as implemented in TMB [@kristensen2016]. Similar to the findings of @thorson2015, we found that the model-based biomass index trends often had lower CVs (coefficient of variations) than the design-based index trends and often helped stabilize biomass estimates for outlying years compared to the design-based index trends (e.g., Figure \@ref(fig:sdmTMB-plot-index)).

We found little difference in the predicted biomass index between models that included or did not include depth covariates (e.g., Figure \@ref(fig:sdmTMB-plot-index)). We found that the main difference between models with or without depth covariates was in the models with depth covariates having slightly more spatially resolved estimates of biomass (Figure \@ref(fig:sdmTMB-spt-plot-all-effects-sqrt) vs. Figure \@ref(fig:sdmTMB-spt-plot-all-effects-sqrt-depth)). We chose to not include depth and depth squared as predictors in the main biomass index illustrated throughout this report. Whereas in nearly all cases including depth covariates generated a similar index to excluding them, in a few cases including them generated what appeared to be unrealistic deviations in biomass when the quadratic shape of the relationship between depth and biomass generated exceedingly high or low estimates of biomass on the border of the survey polygons. This remains a topic of research and will be investigated by the authors in the future.

library(dplyr)
library(ggplot2)
library(sdmTMB)

survey <- "SYN QCS"
n_knots <- 200
bias_correct <- FALSE
# dat <- readRDS(here::here("report/data-cache/walleye-pollock.rds"))
dat <- readRDS(here::here("report/data-cache/pacific-cod.rds"))
dat <- gfplot:::tidy_survey_sets(dat$survey_sets, survey, years = seq(1, 1e6),
  density_column = "density_kgpm2")
dat <- mutate(dat, density = density * 1000 * 1000)
dat <- gfplot:::interp_survey_bathymetry(dat)$data
dat <- gfplot:::scale_survey_predictors(dat)

# grid_locs <- gfplot:::make_prediction_grid(
#   filter(dat, year == max(dat$year)), survey = survey,
#   cell_width = 2)$grid
# grid_locs <- rename(grid_locs, depth = akima_depth)
# grid_locs$year <- NULL

grid_locs <- gfplot::synoptic_grid %>%
  dplyr::filter(survey == "SYN QCS") %>%
  dplyr::select(X, Y, depth)

grid_locs$depth_scaled <-
  (log(grid_locs$depth) - dat$depth_mean[1]) / dat$depth_sd[1]
grid_locs$depth_scaled2 <- grid_locs$depth_scaled^2

# Expand the prediction grid to create a slice for each time:
original_time <- sort(unique(dat$year))
nd <- do.call("rbind",
  replicate(length(original_time), grid_locs, simplify = FALSE))
nd[["year"]] <- rep(original_time, each = nrow(grid_locs))
grid_locs <- nd
spde <- make_spde(dat$X, dat$Y, n_knots = n_knots)
plot_spde(spde)
m_st <- sdmTMB(
  formula = density ~ 0 + as.factor(year),
  data = dat, time = "year", spde = spde, family = tweedie(link = "log"),
  silent = TRUE)
predictions <- predict(m_st, newdata = grid_locs, return_tmb_object = TRUE)
m_st_depth <- sdmTMB(
  formula = density ~ 0 + as.factor(year) + depth_scaled + depth_scaled2,
  data = dat, time = "year", spde = spde, family = tweedie(link = "log"),
  silent = TRUE)
predictions_depth <- predict(m_st_depth, newdata = grid_locs, return_tmb_object = TRUE)
dat$resids <- residuals(m_st) # randomized quantile residuals
# gfplot::plot_qres_histogram(dat$resids)

dat$resids_depth <- residuals(m_st_depth) # randomized quantile residuals
# gfplot::plot_qres_histogram(dat$resids_depth)
plot_map_spt <- function(dat, column) {
  ggplot(dat, aes_string("X", "Y", fill = column)) +
    geom_raster() +
    facet_wrap(~year) +
    coord_fixed() +
    labs(x = en2fr('Easting', french), y = en2fr('Northing', french))
}
ticks <- c(100, 500, 1000, 2000)
limits <- c(0, max(exp(predictions$data$est))*1.001)
plot_map_spt(predictions$data, "exp(est)") +
  ggtitle(en2fr("Prediction (fixed effects + all random effects)", french)) +
  scale_fill_viridis_c(trans = "fourth_root_power", option = "C",
    breaks = ticks, limits = limits) +
  labs(fill = if(french){
    'Estimation de la\nbiomasse (kg/km2)'
    }
    else{
      'Biomass estimate\n(kg/km^2)'
    })

\clearpage

plot_map_spt(predictions_depth$data, "exp(est)") +
  ggtitle(en2fr("Prediction (fixed effects + all random effects)", french)) +
  scale_fill_viridis_c(trans = "fourth_root_power", option = "C",
    breaks = ticks, limits = limits) +
  labs(fill = if(french){
    'Estimation de la\nbiomasse (kg/km2)'
    }
    else{
      'Biomass estimate\n(kg/km^2)'
    })

\clearpage

plot_map_spt(predictions$data, "exp(est_non_rf)") +
  ggtitle(en2fr("Prediction (fixed effects only)", french)) +
  scale_fill_viridis_c(trans = "fourth_root_power", option = "C") +
  labs(fill = if(french){
    'Estimation moyenne\n(kg/km^2)'
    }
    else{
     'Mean estimate\nkg/km^2'
      })

\clearpage

red_blue_fill <- scale_fill_gradient2(midpoint = 0,
    high = scales::muted("red"),
    mid = "white",
    low = scales::muted("blue"))

plot_map_spt(predictions$data, "omega_s") +
  ggtitle(en2fr("Spatial random effects only", french)) +
  red_blue_fill +
  labs(fill = if(french){
    'Écart par rapport à la\nbiomasse prévue dans\nl’espace de relevé'
  }
    else{
      'Deviation from\nexpected biomass\nin log space'
    })

\clearpage

plot_map_spt(predictions$data, "epsilon_st") +
  ggtitle(en2fr("Spatiotemporal random effects only", french)) +
  red_blue_fill +
  labs(fill = if(french){
    'Écart par rapport à la\nbiomasse prévue dans\nl’espace de relevé'
  }
    else{
      'Deviation from\nexpected biomass\nin log space'
    })

\clearpage

red_blue_fill2 <- scale_colour_gradient2(midpoint = 0,
    high = scales::muted("red"),
    mid = "white",
    low = scales::muted("blue"))

ggplot(dat, aes(X, Y, col = resids)) +
  geom_point() + facet_wrap(~year) + coord_fixed() +
  labs(x = en2fr('Easting', french), y = en2fr('Northing', french), colour = en2fr("Residual", french)) +
  red_blue_fill2

\clearpage

ind <- get_index(predictions, bias_correct = FALSE)
ind_depth <- get_index(predictions_depth, bias_correct = FALSE)

\clearpage

# ind_design <- readRDS(here::here('report/data-cache/walleye-pollock.rds'))
ind_design <- readRDS(here::here('report/data-cache/pacific-cod.rds'))
ind_design <- ind_design$survey_index
ind_design <- ind_design %>%
  filter(survey_abbrev  == 'SYN QCS') %>%
  mutate(
    est = biomass / exp(mean(log(biomass), na.rm = TRUE)),
    lwr = lowerci / exp(mean(log(biomass), na.rm = TRUE)),
    upr = upperci / exp(mean(log(biomass), na.rm = TRUE)),
    type = 'Design based'
  )

ind_geo <- ind %>%
  mutate(
    lwr = lwr / exp(mean(log(est), na.rm = TRUE)),
    upr = upr / exp(mean(log(est), na.rm = TRUE)),
    est = est / exp(mean(log(est), na.rm = TRUE)),
    type = "Geostatistical"
  )

ind_geo_depth <- ind_depth %>%
  mutate(
    lwr = lwr / exp(mean(log(est), na.rm = TRUE)),
    upr = upr / exp(mean(log(est), na.rm = TRUE)),
    est = est / exp(mean(log(est), na.rm = TRUE)),
    type = "Geostatistical (depth)"
  )

ind_combined <- bind_rows(ind_design, ind_geo) %>%
  bind_rows(ind_geo_depth)
ggplot(ind_combined, aes(year, est, fill = type)) +
  geom_line(aes(colour = type)) +
  geom_ribbon(aes(ymin = lwr, ymax = upr), alpha = 0.4) +
  xlab(en2fr('Year', french)) +
  ylab(en2fr('Biomass estimate divided by geometric mean', french)) +
  labs(fill = en2fr('Type', french), colour = en2fr('Type', french)) +
  scale_fill_brewer(palette = "Dark2") +
  scale_colour_brewer(palette = "Dark2")

\clearpage



pbs-assess/gfsynopsis documentation built on March 26, 2024, 7:30 p.m.