In PaulRegular/SimSurvey: Test Surveys by Simulating Spatially-Correlated Populations

knitr::opts_chunk$set(echo = FALSE,
                      message = FALSE,
                      warning = FALSE, 
                      fig.width = 10, 
                      fig.height = 5.2)
knitr::opts_knit$set(root.dir = "../..")

# xaringan::inf_mr()

library(SimSurvey)
library(plotly)
library(crosstalk)
library(raster)
library(lattice)
library(viridis)
library(data.table)
load("analysis/cod_sim_exports/2018-10-26_age_clust_test/test_output.RData")

boat_txt <- '<br>
<img src="graphics/teleost_twitter_crop.jpg" width="400"/>
<font size="1"> 
<br>
&nbsp; <a href="https://twitter.com/coastguardcan/status/879410397790515201">Canadian Coast Guard | Twitter</a>
</font>'

## wrapper for layout with some different defaults for this slideshow
tight_layout <- function(p, 
                         plot_bgcolor = "transparent",
                         paper_bgcolor = "transparent",
                         margin = list(l = 0, r = 0, t = 0, b = 0, pad = 0),
                         font = list(size = 14, family = "'Montserrat', sans-serif"), 
                         ..., data = NULL) {
  layout(p, plot_bgcolor = plot_bgcolor, paper_bgcolor = paper_bgcolor,
         margin = margin, font = font, ..., data = data)
}

N?

Fisheries-independent trawl surveys {.columns-2}

Conducted by many RFMOs
Becoming increasingly important in stock assessment
Very costly!
General goal:
- Maximize information & minimize expenses

cat(boat_txt)

Fisheries-independent trawl surveys {.columns-2}

How much survey effort is enough?
- Difficult to answer
  - Multi-stage sampling
  - Intra-haul correlation

cat(boat_txt)

Multi-stage sampling

Surveys are generally designed to:

1 | 2 | 3 :----: | :----: | :----: Conduct sets at multiple locations | Sub-sample catch for length measurements | Sub-sample length groups for age determination | |

Intra-haul correlation

Sampling clusters of fish that have similar characteristics (e.g. length)

Results in samples with positive intra-haul correlation
- Reduces effective sample size

Graphic from Tunstrøm et al (2013) PLoS Comput Biol 9(2): e1002915.

Fisheries-independent trawl surveys {.columns-2}

How much survey effort is enough?
- Attempt to answer using:
  - Sub-sampling
    - Based in reality
    - Truth unknown
  - Simulation
    - Truth known
    - Reality?

cat(boat_txt)

Reality is complicated {data-background=graphics/school_light.png data-background-size=cover}

Dynamic abundance
Spatial aggregation
Survey design
Objective: build a simulation that emulates reality

Simulation steps {.smaller}

Simulate abundance
- Common cohort model
Simulate spatial aggregation
- Includes depth associations and noise correlated across space, years and ages
Simulate surveys
- Stratified random surveys with different sampling protocol
Analyze
- Conduct stratified analyses and compare truth vs. estimate

Simulate abundance

Common cohort model
- $N_{a,y} = N_{a-1,y-1} e^{-Z_{a-1,y-1}}$
First year filled via exponential decay
- $N_{a,1} = N_{a-1,1} e^{-Z_{a-1,1}}$
Random-walk in recruitment
- $log(N_{1,y}) = log(\mu_r) + \epsilon_{y}$, where $\epsilon_{y} \sim N(\epsilon_{y - 1}, \sigma_{r}^{2})$
Total mortality correlated across ages and years
- $log(Z_{a,y}) = log(\mu_Z) + \delta_{a,y}$, where $\delta_{a,y} \sim N(0, \Sigma_{a,y})$
  - Using covariance, $\Sigma_{a,y}$, described in Cadigan [-@Cadigan2016]

Simulate abundance {.tighter}

Trend in total abundance

plot_trend(sim) %>% 
  tight_layout(yaxis = list(rangemode = "tozero"),
               margin = list(l = 70, r = 0, t = 0, b = 40, pad = 0))

Simulate abundance {.tighter}

Trend in abundance at age

plot_surface(sim) %>% tight_layout(scene = list(xaxis = list(range = c(1, 10))))

Simulate spatial aggregation

Generate a square grid of $s$ locations, resolution $R$ and a depth gradient
Distribute population through space and add spatial-temporal noise
- $log(N_{a,y,s}) = log(N_{a,y}) + \eta_{a,y,s}$
- $\eta_{a,y,s}$ includes parabolic relationship with depth and covariance between ages, years and space
  - $\eta_{a,y,s} = \frac{(d_s - \mu_{d}) ^ 2}{2 \sigma_d^2} + \xi_{a,y,s}$
  - $\xi_{a,y,s}$ is a combination of Matérn covariance and the age-year covariance described in Cadigan [-@Cadigan2016]

Simulate spatial aggregation {.tighter}

Distribution for ages 1-6 in years 1-6

plot_distribution(sim, ages = 1:6, years = 1:6, type = "heatmap") %>% 
  tight_layout(margin = list(l = 0, r = 0, t = 40, b = 0, pad = 0))

Simulate survey

Split grid into $N_{div}$ divisions and $N_{strat}$ depth-based strata
Set allocation defined by specifying set density, $D_{sets}$
Binomial sampling was used to simulate catch:
- $n_{a,y,s} \sim Bin(N_{a,y,s}, \frac{A_{trawl}}{A_{cell}} q_a )$
- Accounts for trawl size relative to cell size, $\frac{A_{trawl}}{A_{cell}}$, and catchability, $q_a$ (controlled using a logistic curve)

Simulate survey

Lengths simulated using the original von Bertalanffy growth curve [@Cailliet2006]:
- $log(l) = log(l_{\infty} - (l_{\infty} - l_0) e^{-Ka}) + \varepsilon$, where $\varepsilon \sim N(0, \sigma_{l}^2)$
Sub-sampling is then conducted
- A maximum number of lengths are measured per set, $M_{lengths}$
- A maximum number of ages, $M_{ages}$, are sampled per length group, $l_{group}$, per division

Simulate survey {.tighter}

Catch in year 1 survey simulation 1

plot_survey(sim) %>% 
  tight_layout(margin = list(l = 0, r = 0, t = 0, b = 40, pad = 0))

Settings and analysis {.build}

Roughly based simulation on 3Ps Cod
A series of 175 surveys were tested

| Parameter name | Symbol | Setting | |:----------------------- | :-------------- | :------------------------------------------------ | | Set density | $D_{sets}$ | 0.0005, 0.001, 0.002, 0.005, 0.01 sets / km^2^ | | Length sampling effort | $M_{lengths}$ | 5, 10, 20, 50, 100, 500, 1000 lengths / set | | Age sampling effort | $M_{ages}$ | 2, 5, 10, 20, 50 ages / length group / division |

Settings and analysis

Each survey was run 1000 times over the same population
Abundance estimates, $\hat{I}_{a,y,k}$, were obtained using a stratified analysis [@Smith1981]
Used root-mean-squared error (RMSE) as a measure of the precision and bias of the abundance estimates from each survey:
- $RMSE = \sqrt{\frac{ \sum_{a = 1}^{N_a} \sum_{y = 1}^{N_y} \sum_{k = 1}^{N_k} (\hat{I}{a,y,k} - I{a,y})^2}{N_a N_y N_k}}$
- Where $I_{a,y}$ is the true abundance available to the survey
Powered by R and worked into a package (https://github.com/PaulRegular/SimSurvey)

Assumptions

Population is uniformly distributed within a cell

The survey is an instantaneous snapshot of the population

Fish are aged at random throughout the division within length bins

Ages are estimated without error

Trawl dimensions are perfectly standard

Truth vs. estimate

Total abundance

plot_total_strat_fan(sim, surveys = 1:5, 
                     plot_bgcolor = "transparent", paper_bgcolor = "transparent",
                     font = list(size = 14, family = "'Montserrat', sans-serif"))

Truth vs. estimate

Abundance at age

surveys <- sim$surveys[set_den %in% c(0.0005, 0.001, 0.002, 0.01) &
                         lengths_cap %in% c(10, 100, 500, 1000) &
                         ages_cap %in% c(5, 10, 50), ]
plot_age_strat_fan(sim, surveys = surveys$survey, 
                   years = 1:5, ages = sim$ages, select_by = "year",
                   plot_bgcolor = "transparent", paper_bgcolor = "transparent",
                   font = list(size = 14, family = "'Montserrat', sans-serif"))

Truth vs. estimate

Abundance at age

plot_age_strat_fan(sim, surveys = surveys$survey, 
                   ages = 2:6, years = sim$years, select_by = "age",
                   plot_bgcolor = "transparent", paper_bgcolor = "transparent",
                   font = list(size = 14, family = "'Montserrat', sans-serif"))

Truth vs. estimate {.tighter}

RMSE ~ sampling protocol

slide_font <- list(size = 14, color = rgb(121, 121, 121, maxColorValue = 255))
surface_font <- list(titlefont = slide_font, 
                     tickfont = slide_font)
surface_scene <- list(
  xaxis = surface_font,
  yaxis = surface_font,
  zaxis = surface_font
)
plot_error_surface(sim, plot_by = "rule") %>% 
  tight_layout(scene = surface_scene,
               margin = list(l = 0, r = 0, t = 40, b = 0, pad = 0))

Truth vs. estimate {.tighter}

RMSE ~ sample size

plot_error_surface(sim, plot_by = "samples") %>% 
  tight_layout(scene = surface_scene,
               margin = list(l = 0, r = 0, t = 40, b = 0, pad = 0))

Truth vs. estimate {.smaller}

Recap

Sets = primary sampling unit
- Only way to improve stratified estimates of total abundance
- Best way to improve stratified estimates of abundance at age
Increasing length sampling does not always improve estimates
- At low set densities, more is worse
- At high set densities, more is better
Sampling effort affects both bias and percision
- Stratified estimates are supposed to be unbiased?!
- May be related to age-specific clustering

Check bias

Turned off age-specific clustering

main_sim <- sim
load("analysis/cod_sim_exports/2018-09-07_test/test_output.RData")
plot_age_strat_fan(sim, surveys = surveys$survey,
                   ages = 2:6, years = sim$years, select_by = "age",
                   plot_bgcolor = "transparent", paper_bgcolor = "transparent",
                   font = list(size = 14, family = "'Montserrat', sans-serif"))

Check bias {.tighter}

Turned off age-specific clustering

plot_error_surface(sim, plot_by = "samples") %>% 
  tight_layout(scene = surface_scene,
               margin = list(l = 0, r = 0, t = 40, b = 0, pad = 0))

Check bias

Turned off age-specific clustering

In this scenario,
- Bias is negligible
- Increasing sub-sampling is always beneficial
  - No age-specific clustering means that one fish is as "effective" as the next
- Sub-sampling fewer fish at more locations appears better than many fish at fewer locations
  - Though more sub-samples improve perception of the age-distribution, it does little to inform magnitude

Check bias

Why is there bias in the main scenario? Why is extra sub-sampling sometimes useful / sometimes detrimental?

Likely related to age-specific clustering
Alternate analyses (e.g. ogmap, geostatistical models) may be able to account for such clustering and, therefore, provide estimates with less bias
Requires further investigation

Conclusions

Results suggest that

Stratified estimates are not always unbiased
Sets are the most effective way to improve estimates
Excessive sub-sampling may be detrimental if set density is low

Conclusions

Caution: this simple simulation is far from perfect, focuses on one case study, and lacks a cost component

Makes it difficult to be prescriptive
Nonetheless, results echo the growing body of literature that concludes
- Extra sub-sampling is an ineffective means of improving estimates because within-set samples are correlated
- If the goal is to maximize information, it is likely better to stop collecting psudoreplicates and, instead, focus efforts on other species or the next set

Potential future directions

Tailor to other species and assess sampling protocol
Add a time component to assess trade-offs / efficiency
Assess the effects of
- Missing strata
- Variable trawl area
- Random vs. stratified age sampling
Use this as a platform to test spatial models
Use as an operating model for an MSE
Others?

Acknowledgements

Feedback and advice: Alejandro Buren, Dave Cote, Karen Dwyer, Geoff Evans, Brian Healey, Paul Higdon, Danny Ings, Mariano Koen-Alonso, Joanne Morgan, Derek Osborne, Dwayne Pittman, Don Power, Martha Robertson, Mark Simpson, Brad Squires, Don Stansbury, Peter Upward, ...

Support: Fisheries and Oceans Canada and NSERC

References {.smaller}

PaulRegular/SimSurvey documentation built on Feb. 4, 2025, 11:17 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

PaulRegular/SimSurvey Test Surveys by Simulating Spatially-Correlated Populations

In PaulRegular/SimSurvey: Test Surveys by Simulating Spatially-Correlated Populations

N?

Fisheries-independent trawl surveys {.columns-2}

Fisheries-independent trawl surveys {.columns-2}

Multi-stage sampling

Intra-haul correlation

Fisheries-independent trawl surveys {.columns-2}

Reality is complicated {data-background=graphics/school_light.png data-background-size=cover}

Simulation steps {.smaller}

Simulate abundance

Simulate abundance {.tighter}

Trend in total abundance

Simulate abundance {.tighter}

Trend in abundance at age

Simulate spatial aggregation

Simulate spatial aggregation {.tighter}

Distribution for ages 1-6 in years 1-6

Simulate survey

Simulate survey

Simulate survey {.tighter}

Catch in year 1 survey simulation 1

Settings and analysis {.build}

Settings and analysis

Assumptions

Truth vs. estimate

Total abundance

Truth vs. estimate

Abundance at age

Truth vs. estimate

Abundance at age

Truth vs. estimate {.tighter}

RMSE ~ sampling protocol

Truth vs. estimate {.tighter}

RMSE ~ sample size

Truth vs. estimate {.smaller}

Recap

Check bias

Turned off age-specific clustering

Check bias {.tighter}

Turned off age-specific clustering

Check bias

Turned off age-specific clustering

Check bias

Conclusions

Conclusions

Potential future directions

Acknowledgements

References {.smaller}

R Package Documentation

Browse R Packages

We want your feedback!

PaulRegular/SimSurvey
Test Surveys by Simulating Spatially-Correlated Populations