knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 6,
  fig.height = 6
)

Pre-processing of IBTS data to get desired data for MEPS Table 1

These are the steps for pre-processing the IBTS data to retain just the desired elements for the analyses in the MEPS paper. Should be useful for repeating the analyses with updated data (if so it would be good to use the icesDATRAS package to automate the downloading of the data). This may also be useful for anyone wanting to extract data and do similar analyses. This is not functionalised, but is better to work through the steps in an Rmarkdown-type format to understand and check what is going on.

library(sizeSpectra)

Data were extracted from the IBTS DATRAS website by Julia Blanchard who then undertook some initial processing, as detailed in Supplementary Material of MEPS paper, and also included with the species-specific length-weight parameters from Fung et al. (2012). The data are saved within this package as dataOrig (see data-raw/IBTS-data.R).

First, understand the data:

dim(dataOrig)
names(dataOrig)
dataOrig[1:5,1:7]
dataOrig[1:5,8:13]

summary(dataOrig)

Note that LngtClas is in mm, not cm, but that a and b are the length-weight coefficients for the length being in cm. Will use cm as units later.

Some columns are duplicated and we just want to keep the useful ones. AphiaID is a numerical code for each species. Need to know the number of areas, but don't need to keep Area.

Survey and Quarter are the same for all entries, and we don't need to keep Area, just need the number of areas.

numAreas = length(unique(dataOrig$Area))
numAreas
colsKeep = c("Year",
             "AphiaID",
             "LngtClas",
             "CPUE_number_per_hour",
             "a",
             "b",
             "weight_g",
             "CPUE_bio_per_hour")
colsDiscard = setdiff(names(dataOrig), colsKeep)
colsDiscard

Note that data will change a lot in the following code.

data = sizeSpectra::s_select(dataOrig, colsKeep)   # uses Sebastian Kranz's s_dplyr_funcs.r
data
# str(data)
summary(data)
min(data$CPUE_number_per_hour)

So no negative CPUE values or spurious weights. There are a lot of zero CPUE values:

sum(data$CPUE_number_per_hour == 0)

Want to end up with data in a standard format (based on some original analysis I did when writing the code). Need to rename some of the headings, make the lengths in cm not mm, and (for helpfulness) order by Year, SpecCode and then Lgnt:

  1. Rename the columns:
if(sum( colsKeep != c("Year", "AphiaID", "LngtClas", "CPUE_number_per_hour",
    "a", "b", "weight_g", "CPUE_bio_per_hour")) > 0)
       { stop("Need to adjust renaming") }
names(data) = c("Year", "SpecCode", "LngtClass", "Number", "LWa", "LWb",
         "bodyMass", "CPUE_bio_per_hour")
# CPUE_bio_per_hour is Number * bodyMass
  1. Make cm not mm:
data$LngtClass = data$LngtClass/10
  1. Rearrange the order to be more intuitive:
data = dplyr::arrange(data, Year, SpecCode, LngtClass)
data

That shows that we have a lot of (i) repeated values that can be amalgamated (presumably repeated because at one point the data included details about trawls, or it's just how the data were obtained), (ii) lots of Number == 0 that we can discard, though keep for now since will help verify the binning.

(i) So, each row represents a combination of Year, SpecCode, LngtClass, but these aren't unique. For example, looking at just one species for one year for one length class:

exampleSp = dplyr::filter(data, Year == 1986, SpecCode == 105814, LngtClass == 60)
exampleSp

So, yes, we have multiple counts of 60cm fish of this species, which we can just aggregate together. Do this for all years, species and lengths:

data = dplyr::summarise(dplyr::group_by(data,
                                        Year,
                                        SpecCode,
                                        LngtClass),
                        "Number" = sum(Number)/numAreas,
                        "LWa" = unique(LWa),
                        "LWb" = unique(LWb),
                        "bodyMass" = unique(bodyMass))
data
summary(data)

dplyr::filter(data, SpecCode == 105814, Year == 1986, LngtClass == 60)

So Number here correctly equals the sum of the first two rows of exampleSp divided by seven (areas).

Number is the average number (of each species and length) caught per hour of trawling across all seven areas.

Just confirm the calculations for bodyMass (body mass of an individual of that LngtClass) since they were done during preprocessing; should get the same answer, using species-specific length-weight coversions.

data = dplyr::mutate(data,
                     bodyMass2 = LWa * LngtClass^LWb)
if(max(abs(data$bodyMass2 - data$bodyMass)) > 0.0001) stop("Check conversions")
data = dplyr::select(data, -bodyMass2)              # don't keep the confirming column

Now only include body-mass classes above 4 g, following Blanchard et al. (2005), since data are unreliable for smaller organisms:

range(data$LngtClass)
range(data$bodyMass)
sum(data$bodyMass == 0)   # 2549
sum(data$bodyMass < 4 )   # 6893
data = dplyr::filter(data, bodyMass >= 4)
range(data$bodyMass)
data
summary(data)

Total number of fish in this dataset is

sum(data$Number)

The unique length classes are:

sort(unique(data$LngtClass))
diff(sort(unique(data$LngtClass)))

The 0.5-cm length classes are only for two species, Atlantic Herring (code 126417) and European Sprat (code 126425), as confirmed here (no differences of 0.5cm):

temp = dplyr::filter(data, !(SpecCode %in% c(126417, 126425)))
unique(diff(sort(unique(temp$LngtClass))))

Aside -- species names and codes are in specCodeNames (though it needs updating as more species in the data than listed here):

specCodeNames
length(unique(specCodeNames$speccode))   # checking speccode are unique

Need this to stop earlier groups being kept (can mess up later code):

data = dplyr::ungroup(data)

These next commands here (which are not run in this vignette) are to save IBTS_data in the package (which has already been run once to build the package). Rename and save data with a meaningful name for your own data.

IBTS_data = data
usethis::use_data(IBTS_data, overwrite = TRUE)

So we have the following, where each row is a unique combination of Year, SpecCode and LngtClass (cm, the minimum value of the 1-cm length bin [or 0.5-cm bin for Atlantic Herring and European Sprat]), and Number gives the number of individuals per hour of trawling observed for the combination. Parameters LWa and LWb are the length-weight cofficients for that species from Fung et al. (2012), bodyMass (g) is the resulting estimated body mass for an individual of that species and length class and Biomass (g h^-1^) is calculated here as the total biomass caught per hour of trawling for each row.

The resulting Table 1 is:

data_biomass <- dplyr::mutate(data,
                              Biomass = Number * bodyMass)
knitr::kable(rbind(data_biomass[1:6,],
                   data_biomass[(nrow(data_biomass)-5):nrow(data_biomass),
                                     ]),
             digits=c(0, 0, 0, 3, 4, 4, 2, 2))


andrew-edwards/sizeSpectra documentation built on June 28, 2023, 7:09 p.m.