Vignette to document how capelin surveys have been bootstrapped according to a scheme described in WKICE2015. Briefly, the method consists of the following:
EA
.EAs
and fish sampling stations to areas (or strata).EAs
, stations and fish on stations in parallel 10 thousand times, estimating the SSB and other stock parameters by area for each replicate.Naturally, our package bionechi
, out-commented pre-amble shows how to get from scratch. We use dplyr
, tidyr
from the so-called tidyverse
, stringr
as well for some string manipulation. For the bootstrap we go multi-core and use the base package parallel
.
#install.packages("devtools", repos = "https://cran.hafro.is") #library(devtools) #install_github("sigurdurthorjonsson/bionechi") # 'devtools::check notes this could be removed library(bionechi) library(dplyr) ; library(tidyr) ; library(stringr) ## assume 4 cores are present, adjust as necessary library(parallel)
#install.packages("devtools", repos = "https://cran.hafro.is") #library(devtools) #install_github("sigurdurthorjonsson/bionechi", build_vignettes = TRUE) library(bionechi) library(dplyr) ; library(tidyr) ; library(stringr) ## assume 4 cores are present, adjust as necessary library(parallel)
Data are rdata
with rectangle echo abundance and fishData
with fish biology allocated to areas (current example as in January 2016), as prepared and documented in earlier bionechi-vignettes[to be added].
#getwd() # when doing the vignette-thing find stuff relative to pkg/vignettes-dir rdata <- readRDS("../data/rdata.rds") fishData <- readRDS("../data/fishData.rds")
The first few lines of the two data sets used in the bootstrap are shown below. The rectangle data consist of r
for the rectangle code for smáreitur/sr in this instance, cap
is the rectangle average NASC, A
is the full geographical area of the rectangle,p
is the calculated proportion of the area within a perimeter set on the survey, EA
is the product of the NASC, area and propotion of area assumed to hold fish, and finally area
is the area or stratum code:
knitr::kable(head(rdata,5))
The fish data consist of stod
or the station, l
for length in cm, w
the ungutted weight in grams, m
is a binary classification into immatures and matures, TS
is the target strenght according to the relationsship TS = 19.1 * log(L) - 74.5, sigma
is the back-scattering cross section, and finally area
is the area or stratum code.
knitr::kable(head(fishData, 5))
Set the number of replicates, one thousand in vignette, ten thousand in practice sofar. Set number of cores, here to that available on my machine, adjust down if necessary, possibly up if higher number of processors is available.
nReps <- 1e3 # set to one since devtools::check complains about: # spawning simultaneous processes, # consider increasing when extra processores are available nCores <- 1
We use parallel::mclapply
to bootstrap the tables grouped by area. We break the operation down into logical steps below.
Start the mclapply-call by distributing replicates on the available cores (i.e. 4 at a time in this case), return the list bootResult
from the mclapply
-call, nReps
long.
## do this in a multi-core lapply bootResult <- mclapply(1:nReps, function(x) {
Use dplyr::sample_frac
to pick rectangles at random with replacement within each area, and sum the bootstrapped echo abundance by area.
## bootstrap rectangle EA within areas and sum by area rdata %>% group_by(area) %>% sample_frac(size = 1, replace = TRUE) %>% summarize(EA = sum(EA)) -> eaSum
Similarily, we pick fish on stations at random with replacement, for all stations (column name stod
), hold in data frame resampledFishData
.
## bootstrap fish on stations fishData %>% group_by(stod) %>% sample_frac(size = 1, replace = TRUE) -> resampledFishData
Each station typically has 100 capelin, we get the distinct station numbers, and pick at random with replacement within area, assign the station number we are going to use to resampledStations
.
## bootstrap stations fishData %>% select(area, stod) %>% distinct() %>% group_by(area) %>% sample_frac(size = 1, replace = TRUE) -> resampledStations
Join the resampled stations to resampled fish, this is done within areas and by station.
## and use bootstrapped fish on the stations picked data <- left_join(resampledStations, resampledFishData, by = c("area", "stod"))
With the bootstrapped fish data we average the back-scatter within area, tally the fish we are using (note how this would work for unequal sample sizes), get the bootstrapped total EA
by area, and finally prepare for disaggregating the total number by length, age and maturity. We use dplyr::right_join
to add constant values for areas to each fish in the samples.
## average back-scatter and total EA by area data %>% group_by(area) %>% summarize(meanSigma = mean(sigma), nSampled = n()) %>% left_join(eaSum, by = "area") %>% right_join(data, by = "area") -> data
Calculate the total number of fish within each area, distribute on the length, age and maturity groups according to bootstrapped sample composition in the areas (nFish
), multiply by weight to get the corresponding biomass (bFish
), re-group by area, length, age and maturity before preparing a aggregated summary, finally another re-group on the aggregation in order to prepare the stage for picking stock parameters of interest in the next step.
## find total number by area and distribute by length/age/maturity data %>% group_by(area) %>% mutate(totNfish = EA/meanSigma, nFish = totNfish/nSampled, bFish = nFish*w) %>% group_by(area, l, a, m) %>% summarize(nFish = sum(nFish), bFish = sum(bFish)) %>% group_by(area) -> bootAggr
Now we apply our function bionechi::pick_stock_para
to get a selection of stock parameters out of object bootAggr
, i.e.:
## pick a range of stock parameters, summarized in own bionechi-function bootByArea <- pick_stock_para(bootAggr)
To the stock parameters we add the bootstrapped echo abundance by area.
## get the EA by area as well bootByArea %>% right_join(eaSum, by = "area") -> bootByArea
Find the totals of replicated stock parameters and echo abundance.
## get the totals for all parameters bootByArea %>% summarize_all(sum, na.rm = TRUE) %>% mutate(area = "all") -> bootAll
Bind together results by area and the total result.
## bind the totals to the area results (in a sense a no-no, but handy here) bootByArea %>% mutate(area = as.character(area)) %>% bind_rows(bootAll) -> bootBoth
Calculate proportions of selected stock parameters, tidy up the results by reshaping a wide data frame to a lean and mean three column result. This finishes off the multi-core lapply.
## calculate a few proportions, tidy/wrangle the output also. ## Note we keep zero entries for missing stock components. ## since they may or may not be present in the bootstrap replicates. bootBoth %>% mutate(pimmN1 = immN1/immN, pimmB1 = immB1/immB, pimmN2 = immN2/immN, pimmB2 = immB2/immB, pSSN2 = SSN2/SSN, pSSB2 = SSB2/SSB, pSSN3 = SSN3/SSN, pSSB3 = SSB3/SSB, pSSN4 = SSN4/SSN, pSSB4 = SSB4/SSB) %>% gather(para, value, -area) %>% mutate(value = ifelse(is.na(value), 0, value)) }, mc.cores = nCores)
Now run all of the above in the background in a hidden chunk before we continue.
## do this in a multi-core lapply bootResult <- mclapply(1:nReps, function(x) { ## bootstrap rectangle EA within areas and sum by area rdata %>% group_by(area) %>% sample_frac(size = 1, replace = TRUE) %>% summarize(EA = sum(EA)) -> eaSum ## bootstrap fish on stations fishData %>% group_by(stod) %>% sample_frac(size = 1, replace = TRUE) -> resampledFishData ## bootstrap stations fishData %>% select(area, stod) %>% distinct() %>% group_by(area) %>% sample_frac(size = 1, replace = TRUE) -> resampledStations ## and use bootstrapped fish on the stations picked data <- left_join(resampledStations, resampledFishData, by = c("area", "stod")) ## average back-scatter and total EA by area data %>% group_by(area) %>% summarize(meanSigma = mean(sigma), nSampled = n()) %>% left_join(eaSum, by = "area") %>% right_join(data, by = "area") -> data ## find total number by area and distribute by length/age/maturity data %>% group_by(area) %>% mutate(totNfish = EA/meanSigma, nFish = totNfish/nSampled, bFish = nFish*w) %>% group_by(area, l, a, m) %>% summarize(nFish = sum(nFish), bFish = sum(bFish)) %>% group_by(area) -> bootAggr ## pick a range of stock parameters, summarized in own bionechi-function bootByArea <- pick_stock_para(bootAggr) ## get the EA by area as well bootByArea %>% right_join(eaSum, by = "area") -> bootByArea ## get the totals for all parameters bootByArea %>% summarize_all(sum, na.rm = TRUE) %>% mutate(area = "all") -> bootAll ## bind the totals to the area results (in a sense a no-no, but handy here) bootByArea %>% mutate(area = as.character(area)) %>% bind_rows(bootAll) -> bootBoth ## calculate a few proportions, tidy/wrangle the output also. ## Note we keep zero entries for missing stock components. ## since they may or may not be present in the bootstrap replicates. bootBoth %>% mutate(pimmN1 = immN1/immN, pimmB1 = immB1/immB, pimmN2 = immN2/immN, pimmB2 = immB2/immB, pSSN2 = SSN2/SSN, pSSB2 = SSB2/SSB, pSSN3 = SSN3/SSN, pSSB3 = SSB3/SSB, pSSN4 = SSN4/SSN, pSSB4 = SSB4/SSB) %>% gather(para, value, -area) %>% mutate(value = ifelse(is.na(value), 0, value)) }, mc.cores = nCores)
We bind the bootstrap replicate results together yielding a 3 by number of replicates by (nareas + 1) by number of parameters data frame, save in an rds-file if desired.
bootResult <- bind_rows(bootResult) ## #saveRDS(bootResult, file = "bootResult.rds") ## to read in, e.g.: ## bootResult <- readRDS("bootResult.rds")
We scale the results for echo abundance to square kilometers (length to target strength (TS) to back-scatter relationships return back-scattering cross section in meters squared), biomass to thousands of tonnes (weights are given in grammes in the data base) and the numbers to billions. We use stringr::str_detect
in an ifelse
to pick parameters with names starting with the strings 'imm' or 'SS', i.e. of numbers or biomass either immature or from the spawning stock , so as to leave out parameter names starting with p
, i.e. leave intact the proportions we choose to calculate. Save the bootstrap summary in an rds-file if desired.
## summarize the results, scale to square kilometers ## thousands of tonnes, billions of individuals ## use 'stringr:str_detect' in 'ifelse' to apply appropriate scaling bootResult %>% mutate(value = ifelse(para == "EA", value/1e6, value), value = ifelse(str_detect(para, "^imm"), value/1e9, value), value = ifelse(str_detect(para, "^SS"), value/1e9, value)) %>% group_by(area, para) %>% do(boot_stats(.$value)) -> bootSummary #saveRDS(bootSummary, file = "bootSummary.rds")
We spool out the main results, more detail can be gotten by studying bootSumary
.
## spool out main results knitr::kable(bootSummary %>% filter(para == "EA") %>% select(area:pt50) %>% select(-pt25)) knitr::kable(bootSummary %>% filter(para == "SSB") %>% select(area:pt50) %>% select(-pt25))
At this point, a traditional exclamation!
print("Bittenú, húrra")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.