knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 6, fig.height = 6 )
These are the steps for pre-processing the IBTS data to retain just the desired
elements for the analyses in the MEPS paper. Should be useful for repeating the
analyses with updated data (if so it would be good to use the icesDATRAS
package to automate the downloading of the data). This may also be useful for
anyone wanting to extract data and do similar analyses. This is not
functionalised, but is better to work through the steps in an Rmarkdown-type
format to understand and check what is going on.
library(sizeSpectra)
Data were extracted from the IBTS DATRAS website by Julia Blanchard who then
undertook some initial processing, as detailed in Supplementary
Material of MEPS paper, and also included with the species-specific length-weight
parameters from Fung et al. (2012). The data are saved within this package as dataOrig
(see
data-raw/IBTS-data.R).
First, understand the data:
dim(dataOrig) names(dataOrig) dataOrig[1:5,1:7] dataOrig[1:5,8:13] summary(dataOrig)
Note that LngtClas
is in mm, not cm, but that a
and b
are the
length-weight coefficients for the length being in cm. Will use cm as units
later.
Some columns are duplicated and we just want to keep the useful ones. AphiaID
is a numerical code for each species. Need to know the number of areas, but
don't need to keep Area
.
Survey
and Quarter
are the same for all entries, and we don't need to keep Area
,
just need the number of areas.
numAreas = length(unique(dataOrig$Area)) numAreas colsKeep = c("Year", "AphiaID", "LngtClas", "CPUE_number_per_hour", "a", "b", "weight_g", "CPUE_bio_per_hour") colsDiscard = setdiff(names(dataOrig), colsKeep) colsDiscard
Note that data
will change a lot in the following code.
data = sizeSpectra::s_select(dataOrig, colsKeep) # uses Sebastian Kranz's s_dplyr_funcs.r data # str(data) summary(data) min(data$CPUE_number_per_hour)
So no negative CPUE values or spurious weights. There are a lot of zero CPUE values:
sum(data$CPUE_number_per_hour == 0)
Want to end up with data
in a standard format (based on some original
analysis I did when writing the code). Need to rename some of the headings, make
the lengths in cm not mm, and (for helpfulness) order by Year
, SpecCode
and then Lgnt
:
if(sum( colsKeep != c("Year", "AphiaID", "LngtClas", "CPUE_number_per_hour", "a", "b", "weight_g", "CPUE_bio_per_hour")) > 0) { stop("Need to adjust renaming") } names(data) = c("Year", "SpecCode", "LngtClass", "Number", "LWa", "LWb", "bodyMass", "CPUE_bio_per_hour") # CPUE_bio_per_hour is Number * bodyMass
data$LngtClass = data$LngtClass/10
data = dplyr::arrange(data, Year, SpecCode, LngtClass) data
That shows that we have a lot of (i) repeated values that can be amalgamated
(presumably repeated because at one point the data included details about
trawls, or it's just how the data were obtained), (ii) lots of Number == 0
that we can discard, though keep for now since will help verify the binning.
(i) So, each row represents a combination of Year, SpecCode, LngtClass
, but
these aren't unique. For example, looking at just one species for one year for
one length class:
exampleSp = dplyr::filter(data, Year == 1986, SpecCode == 105814, LngtClass == 60) exampleSp
So, yes, we have multiple counts of 60cm fish of this species, which we can just aggregate together. Do this for all years, species and lengths:
data = dplyr::summarise(dplyr::group_by(data, Year, SpecCode, LngtClass), "Number" = sum(Number)/numAreas, "LWa" = unique(LWa), "LWb" = unique(LWb), "bodyMass" = unique(bodyMass)) data summary(data) dplyr::filter(data, SpecCode == 105814, Year == 1986, LngtClass == 60)
So Number
here correctly equals the sum of the first two rows of exampleSp
divided by seven (areas).
Number
is the average number (of each species and length) caught per
hour of trawling across all seven areas.
Just confirm the calculations for bodyMass
(body mass of an individual of that
LngtClass
) since they were done during preprocessing;
should get the same answer, using species-specific length-weight coversions.
data = dplyr::mutate(data, bodyMass2 = LWa * LngtClass^LWb) if(max(abs(data$bodyMass2 - data$bodyMass)) > 0.0001) stop("Check conversions") data = dplyr::select(data, -bodyMass2) # don't keep the confirming column
Now only include body-mass classes above 4 g, following Blanchard et al. (2005), since data are unreliable for smaller organisms:
range(data$LngtClass) range(data$bodyMass) sum(data$bodyMass == 0) # 2549 sum(data$bodyMass < 4 ) # 6893 data = dplyr::filter(data, bodyMass >= 4) range(data$bodyMass) data summary(data)
Total number of fish in this dataset is
sum(data$Number)
The unique length classes are:
sort(unique(data$LngtClass)) diff(sort(unique(data$LngtClass)))
The 0.5-cm length classes are only for two species, Atlantic Herring (code 126417) and European Sprat (code 126425), as confirmed here (no differences of 0.5cm):
temp = dplyr::filter(data, !(SpecCode %in% c(126417, 126425))) unique(diff(sort(unique(temp$LngtClass))))
Aside -- species names and codes are in specCodeNames
(though it needs updating as more species in the data than
listed here):
specCodeNames length(unique(specCodeNames$speccode)) # checking speccode are unique
Need this to stop earlier groups being kept (can mess up later code):
data = dplyr::ungroup(data)
These next commands here (which are not run in this vignette) are to save IBTS_data
in the package (which has already been run once to build the package). Rename
and save data
with a meaningful name for your own data.
IBTS_data = data usethis::use_data(IBTS_data, overwrite = TRUE)
So we have the following, where each row is a unique combination of Year
,
SpecCode
and LngtClass
(cm, the minimum value of the 1-cm length bin [or
0.5-cm bin for Atlantic Herring and European Sprat]), and Number
gives the number of individuals per hour
of trawling observed for the combination. Parameters LWa
and LWb
are the length-weight cofficients for that species from
Fung et al. (2012), bodyMass
(g) is the resulting estimated body mass for an
individual of that species and length class and Biomass
(g h^-1^) is calculated here as
the total biomass caught per hour of trawling for each row.
The resulting Table 1 is:
data_biomass <- dplyr::mutate(data, Biomass = Number * bodyMass) knitr::kable(rbind(data_biomass[1:6,], data_biomass[(nrow(data_biomass)-5):nrow(data_biomass), ]), digits=c(0, 0, 0, 3, 4, 4, 2, 2))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.