knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 6, fig.height = 4 ) library(diagram)
The survdat
package is designed to work with the Northeast Fisheries Science Center's (NEFSC) bottom trawl surveys. The NEFSC has been conducting standardized bottom trawl surveys in the fall since 1963 and spring since 1968. The surveys follow a stratified random design. Fish species and several invertebrate species are enumerated on a tow by tow basis [@Azarovitz_1981]. The data are housed in the NEFSC's survey database (SVDBS) maintained by the Ecosystem Survey Branch.
The get_survdat_data
function will query the NEFSC survey database (SVSDBS) and apply the appropriate calibration factors associated with gear or vessel differences throughout the time series. However, end users are usually interested in more derived estimates of biomass or abundance that the tow by tow data retrieved via get_survdat_data
. The stratified random design of the survey allows for the calculation of a stratified mean biomass or abundance for a species or group of species.
To facilitate the calculation of these stratified means, the survdat
package has the function calc_stratified_mean
. This function is actually a wrapper of several intermediate functions.
par(mar = rep(1, 4)) openplotmat(main = '') pos <- coordinates(c(5, 5, 5, 5)) #Arrows apos <- 0.75 #Boxes text_size <- 0.7 edge <- 0.07 #Colors data.col <- 'grey' fun.col <- 'light grey' obj.col <- 'white' big.col <- 'grey' #Main Flow straightarrow(from = pos[1, ] - c(0, 0.04), to = pos[2, ] - c(0.06, 0.04), arr.pos = apos) straightarrow(from = pos[8, ], to = pos[9, ] - c(0.04, 0), arr.pos = apos) straightarrow(from = pos[9, ], to = pos[10, ] - c(0.04, 0), arr.pos = apos) straightarrow(from = pos[10, ], to = pos[15, ], arr.pos = apos) #Strata_prep box in background textrect(mid = pos[7, ] + c(0.095, 0), radx = .18, rady = .32, lab = '', shadow.size = 0, box.col = big.col) text(pos[2, ][1], pos[2, ][2], 'strat_prep', cex = text_size) #Polygon/area flow bentarrow(from = pos[11, ] + c(0, 0.14), to = pos[12, ] + c(0, 0.01)) straightarrow(from = pos[12, ] + c(0, 0.01), to = pos[13, ] + c(-0.04, 0.01), arr.pos = apos) #Optional post stratify bentarrow (from = pos[1, ] - c(0, 0.06), to = pos[7, ] + c(0, 0.05), path = 'H', lty = 3) bentarrow (from = pos[11, ] + c(0, 0.16), to = pos[7, ] + c(0, 0.05), path = 'H', lty = 3) straightarrow(from = pos[7, ] + c(0, 0.05), to = pos[8, ] + c(-0.04, 0.05), arr.pos = apos, lty = 3) #Boxes #Functions textrect(mid = pos[10, ], radx = edge, rady = edge, lab = 'strat_mean', cex = text_size, shadow.size = 0, box.col = fun.col) textrect(mid = pos[7, ] + c(0, 0.05), radx = edge, rady = edge, lab = 'post_strat', cex = text_size, shadow.size = 0, box.col = fun.col) textrect(mid = pos[12, ] + c(0, 0.01), radx = edge, rady = edge, lab = 'get_area', cex = text_size, shadow.size = 0, box.col = fun.col) #Data objects textround(mid = pos[1, ] - c(0, 0.05), radx = edge - .08, rady = edge, lab = 'survdat', cex = text_size, shadow.size = 0, box.col = data.col) textround(mid = pos[11, ] + c(0, 0.15), radx = edge - .08, rady = edge, lab = 'areaPolygon', cex = text_size, shadow.size = 0, box.col = data.col) #R Objects texthexa(mid = pos[9, ], radx = edge, rady = edge, lab = 'prepData', cex = text_size, shadow.size = 0, box.col = obj.col) texthexa(mid = pos[8, ] + c(0, 0.05), radx = edge, rady = edge, lab = 'poststratData', cex = text_size, shadow.size = 0, box.col = obj.col) texthexa(mid = pos[15, ], radx = edge, rady = edge, lab = 'stratmeanData', cex = text_size, shadow.size = 0, box.col = obj.col) texthexa(mid = pos[13, ] + c(0, 0.01), radx = edge, rady = edge, lab = 'strataArea', cex = text_size, shadow.size = 0, box.col = obj.col) #Legend straightarrow(from = pos[16, ] + c(0, 0.04) - c(0.08, 0), to = pos[16, ] + c(0, 0.04), lty = 1, arr.pos = 1) straightarrow(from = pos[16, ] - c(0.08, 0.04), to = pos[16, ] - c(0, 0.04), lty = 3, arr.pos = 1) textempty(mid = pos[16, ] + c(0, 0.04), lab = 'primary \npathway', cex = text_size) textempty(mid = pos[16, ] - c(0, 0.04) + c(0, 0), lab = 'optional \npathway', cex = text_size) textround(mid = pos[17, ], radx = 0.02, rady = 0.04, lab = 'data', cex = text_size, shadow.size = 0, box.col = data.col) textrect(mid = pos[18, ], radx = 0.04, rady = 0.04, lab = 'function', cex = text_size, shadow.size = 0, box.col = fun.col) texthexa(mid = pos[19, ], radx = 0.05, rady = 0.04, lab = 'R object', cex = text_size, shadow.size = 0, box.col = obj.col)
strat_prep
The first function called is strat_prep
. As the name implies, strat_prep
prepares the data set so that the stratified means can be calculated. Nested within this function are the post_strat
and get_area
functions. The post_strat
function is used if not following the stratified design of the survey. It is automatically called if the user inputs an sf
object rather than the specified default 'NEFSC strata'. When this occurs, the R object poststratData
generated by post_strat
will replace the survdat
data object and is ultimately passed through to the prepData
object. This will also turn on the poststratFlag
which is used by the strat_mean
function.
At this point you can also filter the data using the filterBySeason
or filterByArea
arguments. The first will allow you to select one of three options: 'FALL', 'SPRING', or 'all'. The default option is 'all'. Note that selecting both the spring and fall survey will combine the two surveys. So if you want to have separate means for both surveys you need to run them independently. The second filter will allow you to select a subset of strata to use in your analysis.
The rest of the prep work done by strat_prep
is counting the number of tows per strata and the proportional weight of the strata. This adds the columns ntows
and W.h
to the data set. The number of tows (ntows
) is simply the length of unique station records per stratum:
# Count the number of stations in each year for each Region data.table::setkey(stations, YEAR, STRAT) stations[, ntows := length(STATION), by = key(stations)]
Proportional weights of the strata are based on their area. The area of the strata are calculated using the get_area
function which uses a Lambert Conformal Conic projection. The get_area
function creates a list of strata and their corresponding areas (A) which strat_prep
uses to calculate the relative weight (W) of each strata h as:
\begin{equation} \label{PropArea} \begin{split} W_h = \frac{A_h}{\sum{A_h}} \end{split} \end{equation}
After counting the number of tows and calculating the relative weight of each strata, the resulting prepData
is then passed to the strat_mean
function.
strat_mean
Once the data has been prepared using strat_prep
the real calculations are carried out in the function strat_mean
. This function will remove duplicate catch records that may be present due to length data, merge sexed species (or leave separate if specified), calculate the total number of stations for the year, subset the species list if desired, and calculate the stratified mean biomass and adundance along with the associated variance.
The first few actions are straightforward. Length data is removed to ensure each catch is accounted for once.
#Remove length data if present data.table::setkey(stratmeanData, CRUISE6, STRATUM, STATION, SVSPP, CATCHSEX) stratmeanData <- unique(stratmeanData, by = key(stratmeanData)) stratmeanData[, c('LENGTH', 'NUMLEN') := NULL]
Similarly, catch from sexed species such as spiny dogfish are merged or kept separate. The default is to merge them but a user may specify to keep them separate by setting the mergesexFlag
to FALSE. Sexed species are kept separate by changing their group designation to a concatenation of their group name and catch sex.
#Merge sex or keep separate if(mergesexFlag == F) stratmeanData[, group := paste(group, CATCHSEX, sep = '')] data.table::setkey(stratmeanData, CRUISE6, strat, STATION, group) stratmeanData[, BIOMASS := sum(BIOMASS), by = key(stratmeanData)] stratmeanData[, ABUNDANCE := sum(ABUNDANCE), by = key(stratmeanData)] stratmeanData <- unique(stratmeanData, by = key(stratmeanData))
The total number of tows per year is the sum of ntows
for all of the strata sampled for a given year.
#Calculate total number of stations per year data.table::setkey(stratmeanData, strat, YEAR) N <- unique(stratmeanData, by = key(stratmeanData)) N <- N[, sum(ntows), by = 'YEAR'] data.table::setnames(N, 'V1', 'N')
Species or group list can be subsetted using the filterByGroup
argument.
Finally, the stratified mean biomass is calculated as:
\begin{equation} \label{Stratifed} \bar{B} = \sum_{h = 1}^{L}{W_h\bar{B_h}} \end{equation}
where $\bar{B}$ is the stratified mean biomass, $W_h$ and $\bar{B_h}$ the relative weight and mean biomass from stratum $h$, and $L$ the total number of strata. $\bar{B_h}$ is calculated as:
\begin{equation} \label{mean biomass} \bar{B_h} = \frac{\sum_{i = 1}^n{B_i}}{n_h} \end{equation}
where $B_i$ is the biomass at station $i$ and $n_h$ the number of stations in stratum $h$.
Additionally, variance is calculated as:
\begin{equation} \label{stratified variance} s_{\bar{B}}^2 = \sum_{h = 1}^L{\frac{W_{h}^{2} s_{h}^{2}}{n_{h}}} \end{equation}
where $s_{\bar{B}}^2$ is the variance of the stratified mean biomass and $s_h^2$ the variance of the mean for stratum $h$. $s_h^2$ is calculated as:
\begin{equation} \label{strata variance} s_h^2 = \frac{{(B_i - \bar{B_h})}^2}{n_h - 1} \end{equation}
If the poststratFlag
was set to TRUE because a different stratification was used rather than the survey design, the variance calculation becomes:
\begin{equation} \label{penalty variance} s_{\bar{B}}^2 = \frac{\sum_{h = 1}^L{s_{h}^{2} W_h}}{N} + \sum_{h = 1}^L{\frac{(1 - W_h)s_h^2}{N^2}} \end{equation}
where $N$ is the total number of stations for the year.
The final calculation for the calc_stratified_mean
function is the standard error of the mean which is:
\begin{equation} \label{standard error} s_{\bar{B}} = \sqrt{s_{\bar{B}}^2} \end{equation}
Similar calculations are carried out for abundance with numbers replacing biomass in the equations above.
The default output from the calc_stratified_mean
function is a wide table with a row for every year/group combination. You can elect a tidy format (long) instead by specifying the argument tidy
= TRUE. This will melt the table down so that each year/group combo will now have a seperate row for stratified mean biomass, stratified mean abundance, variance of the stratified mean biomass, and variance of the stratified mean abundance. The tidy data will also append a units column to the data set. The units for the stratified mean biomass is $kg tow^-1$ while the stratified mean abundance is $numbers tow^-1$. Note that the tidy data set does not include the standard errors.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.