In sigurdurthorjonsson/bionechi: Utilities for analysis of acoustic survey results

Vignette to document how capelin surveys have been bootstrapped according to a scheme described in WKICE2015. Briefly, the method consists of the following:

Find rectangle average NASC and multiply with rectangle area assumed to hold fish which gives echo abundance, EA.
Allocate the rectangle EAs and fish sampling stations to areas (or strata).
Bootstrap rectangle EAs, stations and fish on stations in parallel 10 thousand times, estimating the SSB and other stock parameters by area for each replicate.
Keep results for all areas along with the total for each replicate.

Packages

Naturally, our package bionechi, out-commented pre-amble shows how to get from scratch. We use dplyr, tidyr from the so-called tidyverse, stringr as well for some string manipulation. For the bootstrap we go multi-core and use the base package parallel.

#install.packages("devtools", repos = "https://cran.hafro.is")
#library(devtools)
#install_github("sigurdurthorjonsson/bionechi")

# 'devtools::check notes this could be removed 
library(bionechi)

library(dplyr) ; library(tidyr) ; library(stringr)
## assume 4 cores are present, adjust as necessary
library(parallel)

#install.packages("devtools", repos = "https://cran.hafro.is")
#library(devtools)
#install_github("sigurdurthorjonsson/bionechi", build_vignettes = TRUE)

library(bionechi)

library(dplyr) ; library(tidyr) ; library(stringr)
## assume 4 cores are present, adjust as necessary
library(parallel)

Data

Data are rdata with rectangle echo abundance and fishData with fish biology allocated to areas (current example as in January 2016), as prepared and documented in earlier bionechi-vignettes[to be added].

#getwd()
# when doing the vignette-thing find stuff relative to pkg/vignettes-dir
rdata <- readRDS("../data/rdata.rds")
fishData <- readRDS("../data/fishData.rds")

Table input

The first few lines of the two data sets used in the bootstrap are shown below. The rectangle data consist of r for the rectangle code for smáreitur/sr in this instance, cap is the rectangle average NASC, A is the full geographical area of the rectangle,p is the calculated proportion of the area within a perimeter set on the survey, EA is the product of the NASC, area and propotion of area assumed to hold fish, and finally area is the area or stratum code:

knitr::kable(head(rdata,5))

The fish data consist of stod or the station, l for length in cm, w the ungutted weight in grams, m is a binary classification into immatures and matures, TS is the target strenght according to the relationsship TS = 19.1 * log(L) - 74.5, sigma is the back-scattering cross section, and finally area is the area or stratum code.

knitr::kable(head(fishData, 5))

Method

Set the number of replicates, one thousand in vignette, ten thousand in practice sofar. Set number of cores, here to that available on my machine, adjust down if necessary, possibly up if higher number of processors is available.

nReps <- 1e3
# set to one since devtools::check complains about:
#   spawning simultaneous processes, 
# consider increasing when extra processores are available
nCores <- 1

The bootstrap in parallel

We use parallel::mclapply to bootstrap the tables grouped by area. We break the operation down into logical steps below.

Start the mclapply-call by distributing replicates on the available cores (i.e. 4 at a time in this case), return the list bootResult from the mclapply-call, nReps long.

## do this in a multi-core lapply 
bootResult <- mclapply(1:nReps, function(x) {

Use dplyr::sample_frac to pick rectangles at random with replacement within each area, and sum the bootstrapped echo abundance by area.

## bootstrap rectangle EA within areas and sum by area
rdata %>%
  group_by(area) %>%
  sample_frac(size = 1, replace = TRUE) %>%
  summarize(EA = sum(EA)) -> eaSum

Similarily, we pick fish on stations at random with replacement, for all stations (column name stod), hold in data frame resampledFishData.

## bootstrap fish on stations
fishData %>%
  group_by(stod) %>%
  sample_frac(size = 1, replace = TRUE) -> resampledFishData

Each station typically has 100 capelin, we get the distinct station numbers, and pick at random with replacement within area, assign the station number we are going to use to resampledStations.

## bootstrap stations
fishData %>%
  select(area, stod) %>%
  distinct() %>%
  group_by(area) %>%
  sample_frac(size = 1, replace = TRUE) -> resampledStations

Join the resampled stations to resampled fish, this is done within areas and by station.

## and use bootstrapped fish on the stations picked
data <- left_join(resampledStations, resampledFishData, 
  by = c("area", "stod"))

With the bootstrapped fish data we average the back-scatter within area, tally the fish we are using (note how this would work for unequal sample sizes), get the bootstrapped total EA by area, and finally prepare for disaggregating the total number by length, age and maturity. We use dplyr::right_join to add constant values for areas to each fish in the samples.

## average back-scatter and total EA by area
data %>%
  group_by(area) %>%
  summarize(meanSigma = mean(sigma),
    nSampled = n()) %>%
  left_join(eaSum, by = "area") %>%
  right_join(data, by = "area") -> data

Calculate the total number of fish within each area, distribute on the length, age and maturity groups according to bootstrapped sample composition in the areas (nFish), multiply by weight to get the corresponding biomass (bFish), re-group by area, length, age and maturity before preparing a aggregated summary, finally another re-group on the aggregation in order to prepare the stage for picking stock parameters of interest in the next step.

## find total number by area and distribute by length/age/maturity
data %>% 
  group_by(area) %>%
  mutate(totNfish = EA/meanSigma,
    nFish = totNfish/nSampled,
    bFish = nFish*w) %>%
  group_by(area, l, a, m) %>%
  summarize(nFish = sum(nFish),
    bFish = sum(bFish)) %>% group_by(area) -> bootAggr

Now we apply our function bionechi::pick_stock_para to get a selection of stock parameters out of object bootAggr, i.e.:

SSB, spawning stock biomass,
SSN, spawning stock numbers,
immN, immature numbers,
immB, immature biomass, and for selected age groups
immN{12} and immB{12}, numbers and biomass of age group 1 and 2 immatures,
SSB{234} and SSN{234}, numbers and biomass of age group 2, 3 and 4 in the spawning stock.

## pick a range of stock parameters, summarized in own bionechi-function 
bootByArea <- pick_stock_para(bootAggr)

To the stock parameters we add the bootstrapped echo abundance by area.

## get the EA by area as well 
bootByArea %>% 
  right_join(eaSum, by = "area") -> bootByArea

Find the totals of replicated stock parameters and echo abundance.

## get the totals for all parameters
bootByArea %>% 
  summarize_all(sum, na.rm = TRUE) %>% 
  mutate(area = "all") -> bootAll

Bind together results by area and the total result.

## bind the totals to the area results (in a sense a no-no, but handy here)
bootByArea %>% mutate(area = as.character(area)) %>%
  bind_rows(bootAll) -> bootBoth

Calculate proportions of selected stock parameters, tidy up the results by reshaping a wide data frame to a lean and mean three column result. This finishes off the multi-core lapply.

## calculate a few proportions, tidy/wrangle the output also.
## Note we keep zero entries for missing stock components.
## since they may or may not be present in the bootstrap replicates.
bootBoth %>%
  mutate(pimmN1 = immN1/immN, pimmB1 = immB1/immB,
    pimmN2 = immN2/immN, pimmB2 = immB2/immB,
    pSSN2 = SSN2/SSN, pSSB2 = SSB2/SSB,
    pSSN3 = SSN3/SSN, pSSB3 = SSB3/SSB,
    pSSN4 = SSN4/SSN, pSSB4 = SSB4/SSB) %>%
  gather(para, value, -area) %>%
  mutate(value = ifelse(is.na(value), 0, value))
}, mc.cores = nCores)

Now run all of the above in the background in a hidden chunk before we continue.

## do this in a multi-core lapply 
bootResult <- mclapply(1:nReps, function(x) {

## bootstrap rectangle EA within areas and sum by area

rdata %>%
  group_by(area) %>%
  sample_frac(size = 1, replace = TRUE) %>%
  summarize(EA = sum(EA)) -> eaSum

## bootstrap fish on stations

fishData %>%
  group_by(stod) %>%
  sample_frac(size = 1, replace = TRUE) -> resampledFishData

## bootstrap stations

fishData %>%
  select(area, stod) %>%
  distinct() %>%
  group_by(area) %>%
  sample_frac(size = 1, replace = TRUE) -> resampledStations

## and use bootstrapped fish on the stations picked

data <- left_join(resampledStations, resampledFishData, 
  by = c("area", "stod"))

## average back-scatter and total EA by area

data %>%
  group_by(area) %>%
  summarize(meanSigma = mean(sigma),
    nSampled = n()) %>%
  left_join(eaSum, by = "area") %>%
  right_join(data, by = "area") -> data

## find total number by area and distribute by length/age/maturity

data %>% 
  group_by(area) %>%
  mutate(totNfish = EA/meanSigma,
    nFish = totNfish/nSampled,
    bFish = nFish*w) %>%
  group_by(area, l, a, m) %>%
  summarize(nFish = sum(nFish),
    bFish = sum(bFish)) %>% group_by(area) -> bootAggr

## pick a range of stock parameters, summarized in own bionechi-function 

bootByArea <- pick_stock_para(bootAggr)

## get the EA by area as well 

bootByArea %>% 
  right_join(eaSum, by = "area") -> bootByArea

## get the totals for all parameters

bootByArea %>% 
  summarize_all(sum, na.rm = TRUE) %>% 
  mutate(area = "all") -> bootAll

## bind the totals to the area results (in a sense a no-no, but handy here)

bootByArea %>% mutate(area = as.character(area)) %>%
  bind_rows(bootAll) -> bootBoth

## calculate a few proportions, tidy/wrangle the output also.
## Note we keep zero entries for missing stock components.
## since they may or may not be present in the bootstrap replicates.

bootBoth %>%
  mutate(pimmN1 = immN1/immN, pimmB1 = immB1/immB,
    pimmN2 = immN2/immN, pimmB2 = immB2/immB,
    pSSN2 = SSN2/SSN, pSSB2 = SSB2/SSB,
    pSSN3 = SSN3/SSN, pSSB3 = SSB3/SSB,
    pSSN4 = SSN4/SSN, pSSB4 = SSB4/SSB) %>%
  gather(para, value, -area) %>%
  mutate(value = ifelse(is.na(value), 0, value))
}, mc.cores = nCores)

Steps after the bootstrap

We bind the bootstrap replicate results together yielding a 3 by number of replicates by (nareas + 1) by number of parameters data frame, save in an rds-file if desired.

bootResult <- bind_rows(bootResult)

## 
#saveRDS(bootResult, file = "bootResult.rds")
## to read in, e.g.:
## bootResult <- readRDS("bootResult.rds")

We scale the results for echo abundance to square kilometers (length to target strength (TS) to back-scatter relationships return back-scattering cross section in meters squared), biomass to thousands of tonnes (weights are given in grammes in the data base) and the numbers to billions. We use stringr::str_detect in an ifelse to pick parameters with names starting with the strings 'imm' or 'SS', i.e. of numbers or biomass either immature or from the spawning stock , so as to leave out parameter names starting with p, i.e. leave intact the proportions we choose to calculate. Save the bootstrap summary in an rds-file if desired.

## summarize the results, scale to square kilometers
## thousands of tonnes, billions of individuals
## use 'stringr:str_detect' in 'ifelse' to apply appropriate scaling 

bootResult %>%  
  mutate(value = ifelse(para == "EA", value/1e6, value), 
    value = ifelse(str_detect(para, "^imm"), value/1e9, value),
    value = ifelse(str_detect(para, "^SS"), value/1e9, value)) %>%
  group_by(area, para) %>% 
  do(boot_stats(.$value)) -> bootSummary

#saveRDS(bootSummary, file = "bootSummary.rds")

Tables out

We spool out the main results, more detail can be gotten by studying bootSumary.

## spool out main results

knitr::kable(bootSummary %>% filter(para == "EA") %>% 
  select(area:pt50) %>% select(-pt25))

knitr::kable(bootSummary %>% filter(para == "SSB") %>% 
  select(area:pt50) %>% select(-pt25))