Estimating Population Indices with Spatiotemporal Models {#app:stitching}

We used spatiotemporal models both to standardize individual surveys and to combine or "stitch" related surveys into BC coastwide or survey-wide indices of abundance or biomass. We first describe the models used to combine surveys and then describe updates to the models for individual surveys. These combined indices appear in the survey relative index figures (e.g., Figure \@ref(fig:survey-index)) for SYN WCHG/HS/QCS/WCVI, HBLL OUT N/S, and HBLL INS N/S. We apply the same approach for the MSSM WCVI survey and the IPHC FISS because changes to the sampling domain in both cases necessitate a similar (autoregressive) model.

We considered models that included both one and two linear predictors (described below). For any one linear predictor, we can represent the spatiotemporal model as

\begin{align} \mathbb{E}[y_{\boldsymbol{s},t}] &= \mu_{\boldsymbol{s},t},\ \mu_{\boldsymbol{s},t} &= \exp \left( \eta_{\boldsymbol{s}, t} \right),\ \eta_{\boldsymbol{s}, t} &= \omega_{\boldsymbol{s}} + \delta_{\boldsymbol{s},t} + O_{\boldsymbol{s},t}. \end{align}

Here, the expected value, $\mathbb{E}[y_{\boldsymbol{s},t}]$, of response variable $y_{\boldsymbol{s},t}$ at spatial coordinates $\boldsymbol{s}$ and time (year) $t$ is the mean at that point in space and time, given by $\mu_{\boldsymbol{s},t}$. That mean is equal to an inverse link function applied to the linear predictor $\eta_{\boldsymbol{s},t}$. The linear predictor is comprised of a spatial Gaussian random field value $\omega_{\boldsymbol{s}}$, a spatiotemporal value $\delta_{\boldsymbol{s},t}$, and an "offset" $O_{\boldsymbol{s},t}$. The offset is effectively a covariate with a coefficient fixed at 1 and allows us to control for survey effort (here, log area swept for trawl surveys, log effective hooks fished for HBLL surveys, and log effective skates fished for IPHC FISS).

The spatial Gaussian random field is assumed to be drawn from a multivariate normal (MVN) distribution with covariance matrix $\boldsymbol{\Sigma}_{\omega}$ constrained by a Matérn covariance function

\begin{equation} \boldsymbol{\omega} \sim \operatorname{MVNormal} (\boldsymbol{0}, \boldsymbol{\Sigma}_{\omega}). \end{equation}

The spatiotemporal portion is structured as a random walk with Gaussian random field annual deviations

\begin{align} \boldsymbol{\delta}{t=1} &\sim \operatorname{MVNormal} (\boldsymbol{0}, \boldsymbol{\Sigma}{\epsilon}),\ \boldsymbol{\delta}{t>1} &= \boldsymbol{\delta}{t-1} + \boldsymbol{\epsilon_{t-1}}, \: \boldsymbol{\epsilon_{t-1}} \sim \operatorname{MVNormal} \left(\boldsymbol{0}, \boldsymbol{\Sigma}_{\epsilon} \right). \end{align}

We chose this random walk structure based on ongoing research into spatiotemporal model structures that are robust to the bienniel survey structure present in the BC surveys modelled here. Such a structure has been used previously in @anderson2023sopo and @english2023. The structure is similar to the first-order autoregressive (AR1) spatiotemporal model used by the International Pacific Halibut Commission when modelling their survey data [@webster2020].

For the trawl surveys, which sample catch weight per area swept, we considered five possible families (combinations of linear predictor links and observation likelihoods): Tweedie, delta-gamma, delta-lognormal, delta-Poisson-link-gamma, and delta-Poisson-link-lognormal. We then choose the family with the lowest marginal AIC (Akaike information criterion) for each survey-species combination (among the models that converged). The @tweedie1984 likelihood with a log link is the approach described in @anderson2019synopsis. The delta-gamma and delta-lognormal families are widely used [e.g., @thorson2015] delta or hurdle [@aitchison1955] models with (1) a linear predictor for encounter probability with a logit link and a Bernoulli likelihood and (2) a linear predictor for catch rate conditional on an encounter with a log link and a gamma or lognormal likelihood. The delta-Poisson-link-gamma and delta-Poisson-link-lognormal families are described in @thorson2018poisson. In these models, the linear predictors ($\eta_1$ and $\eta_2$) represent a group number $n$ and a catch per group $w$

\begin{align} \log (n) &= \eta_1,\ \log (w) &= \eta_2. \end{align}

These get transformed [@thorson2018poisson] as

\begin{align} p &= 1 - \exp(- \exp( \log n)),\ r &= \frac{n w}{p}, \end{align}

where Eq. A.9 has its roots in a Poisson process (and the 'cloglog' link) and the probabilities $p$ and positive rates $r$ are modelled as Bernoulli and gamma or lognormal, respectively.

In the case of the hook and line surveys (IPHC, HBLL OUT, HBLL INS), which sample catch counts, we applied a negative binomial ('NB2') observation likelihood [@hilbe2011] where the variance scales quadratically with the mean. For these hook and line surveys, we applied an instantaneous catch rate (ICR) correction factor to the observed counts based on the number of baited hooks returned [@somerton1995; @obradovich2018; @anderson2019synopsis Appendix G; further background in @watson2022]. We explored a censored Poisson likelihood with additional observation-level dispersion [@watson2022], but found a poorer fit to the data (as indicated by marginal AIC and plots of residuals) for many species and model convergence issues in several cases. We plan to revisit this censored likelihood approach in a future version of the report.

We implemented these models using the Stochastic Partial Differential Equation (SPDE) approximation to Gaussian random fields with Gaussian Markov random fields [@lindgren2011] for computational efficiency using the R [@r2023] package sdmTMB [@anderson2022sdmTMB]. sdmTMB estimates the marginal log likelihood using TMB to integrate over random effects with the Laplace approximation [@kristensen2016]. This SPDE approach requires creating triangulation "meshes" [@lindgren2011] for SPDE calculations and to interpolate from the random effects to the data locations or prediction locations with bilinear interpolation. We constructed our meshes with a minimum triangle edge length of 20 km for the SYN, HBLL, and IPHC models and with a minimum edge length of 8 km for the MSSM WCVI survey, which was smaller in size.

We fit our models to all species where the proportion of positive sets across all observed survey sets was $\ge$ 5%. Due to the smaller survey area, deeper depth strata, and geographical distance to the other synoptic surveys, when the proportion of positive sets in SYN WCHG was $<$ 5%, we ommitted this survey from the modelling. We also omitted models that did not pass convergence criteria. To assess convergence, we checked that the Hessian was positive definite, that the maximum marginal log likelihood gradient was $<$ 0.001 with respect to all fixed effects, and that no Gaussian random field marginal standard deviations were approaching a boundary (interpreted as ensured their standard deviations were $>$ 0.01). For the MSSM survey we also omitted models when the maximum estimated confidence intervals were $>$ 10 times the maximum estimated biomass or when the mean coefficient of variation was $>$ 1. These are both indicative of poorly fitted models.

From our fitted models, we calculated an area-weighted biomass index by predicting density from our model across the entire survey grid, multiplying that density by grid cell area, and summing the resulting biomass across grid cells [e.g., @thorson2015]. This is further described in @anderson2019synopsis. The 2 $\times$ 2 km survey grid for the Synoptic trawl surveys and the Hard Bottom Longline surveys is shown throughout the species pages (e.g., Figure \@ref(fig:survey-maps)). We derived a survey grid for the MSSM survey by overlaying a 3 $\times$ 3 km grid covering all sampling locations between 2009 and 2021 (Figure \@ref(fig:mssm-grid)). The 3 km grid size approximately matched the resolution of the now-fixed-station sampling design. We chose the 2009 to 2021 timespan because it reflected a relatively stable sampling design relevant to recent years. Some areas that were sampled in earlier years are no longer sampled and would add considerable uncertainty to the index calculations if we predicted on these no-longer-sampled grid cells. Similarly, the 2022 survey year introduced some new survey locations that had not been surveyed since the 1970s.

knitr::include_graphics(here::here('report', 'tech-report', 'figure', 'grid-prediction-2009.png'), dpi = NA)

We derived spatiotemporal model-based indices of abundance for the individual surveys largely as described in @anderson2019synopsis with the following changes:

\clearpage



pbs-assess/gfsynopsis documentation built on March 26, 2024, 7:30 p.m.