knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
options(rmarkdown.html_vignette.check_title = FALSE) 

This example shows the data compilation of stomach contents observations into population diet after the initial data manipulation have been done (see example 1 and 2). The compilation includes assigning partly identified prey items to known preys, and calculation of the population diet from a weighed mean of the observed data. Both re-distribution of partly identified preys and raising the observation to population mean require stratification of data in space and time.

First data from the previous data manipulations ("Example 02") are loaded.

library(FishStomachs)
suppressMessages(library(dplyr))
suppressMessages(library(readr))

load(file=file.path(system.file( package = "FishStomachs"),"extdata","ex02.Rdata"))
s

do_detailed_output(s, by_sample_id=TRUE,to_screen=TRUE, label='Begin example 3')

Group prey species

The Multi species model, SMS, groups the prey species into commercial species and another species, both fish and invertebrates, into "Other food". For this, the NODC system is convenient due to its hierarchical structure. First, an input file is made to define the structure for grouping. Here, data files from the FishStomachs package are used to produce the key (data set NODC_split) to split/aggregate data.

a<-read_csv(file.path(system.file( package = "FishStomachs"),"extdata","species_info.csv"),
        col_types = cols()) %>% 
        filter(nodc>0 & prey_sp) %>%
        select(species,nodc)%>%distinct() %>%
        rename(species_group=species,First=nodc) %>% 
        mutate(Last=First,named=TRUE)
head(a)


b<-read_csv(file.path(system.file( package = "FishStomachs"),"extdata","other_food_nodc.csv"),
        col_types = cols()) %>%
        mutate(named=FALSE)
head(b)

NODC_split<-bind_rows(a,b)
NODC_split

The NODC_split data includes the named preys (named=TRUE) that are included in the SMS model. The other group of preys (named=FALSE ) includes other prey groups that are not fully species identified or species identified preys which belong to the group of "other food" where

# first NODC codes are added to the data set
s<-add_NODC_ID(s, predator_or_prey = c("predator", "prey")[1]) 
s<-add_NODC_ID(s, predator_or_prey = c("predator", "prey")[2])


s<-group_prey_species(s,NODC_split=NODC_split,show_allocations=FALSE)
do_detailed_output(s, by_sample_id=TRUE,to_screen=TRUE, label='After group_prey_species')

Data are now aggregated in the specified groups, as shown above.

Aggregation of stomach within the same sample_Id and predator size class

The individual stomachs from the same sample Id, predator and prey size classes are aggregated.

s<-aggregate_within_sample(s)
s
do_detailed_output(s, by_sample_id=TRUE,to_screen=TRUE, label="After aggregation")

The call to \code{aggregate_within_sample} replaces fish_id by the predator size class. In this case there are are only one fish_id within a sample id, so there is no changes except the use of predator and prey size classes.

Assigning strata, first step in estimating population diet

The average diet should basically be calculated as a stratified mean of the individual stomach content samples, weighted by strata abundances (indices) of the predator and areas of the strata. In this example, an average stomach contents is first calculate by ICES rectangles. When there was more than one sample within a rectangle the rectangle value was calculated as the weighted mean of the sample (haul) average values, the weighting factors being the number of stomachs from each haul.

The average stomach contents are then calculated for a larger area which in this case is the ICES roundfish areas consisting of a number of ICES rectangles. The mean strata_area is a weighted average of the rectangle values were weighted by the mean square root of survey catch rates within the rectangle; this reduces the influence of occasional very large catch rates.

Finally, the population diet is calculated from the average roundfish stomach contents weighted by the proportion of the predator population within the roundfish areas.

The temporal stratification is based on the year and quarter of the year.

The stratification is specified in the control object to STOMobs as shown below

#structure and default contents
print(get_control(s),show=c('Stratification'))

s<-edit_control(s,
     strata_area_exp=expression(paste('R',area,sep='-')),
     strata_sub_area_exp=expression(rectangle),
     strata_time_exp=expression(paste0(year, "-Q", quarter)),
     strata_year_back=expression(as.numeric(substr(stratum_time, 1, 4))),
     strata_quarter_back=expression(as.numeric(substr(stratum_time, 7, 8)))
    )

# to see the changes
# print(get_control(s),show=c('stratification'))

The "strata_sub_area_exp" is defined from the STOMobs field rectangle, and the "strata_area" from area (with an "R-" in front). The "strata_time_exp" is defined from the fields year and quarter. Note that definition of strata is provided as as R expression. The "strata_year_back" and "strata_quarter_back" are expression to derive year and quarter from the strata_time variable (used in Example 4)

s<-add_strata(s)

do_detailed_output(s,label='After add_strata', by_sample_id=TRUE,to_screen=TRUE)

# save data set for use in example 5
 save(s,file=file.path(system.file( package = "FishStomachs"),"extdata","ex03half.Rdata"))

Assigning partly identified prey items to identified preys.

Prey items are now named as model preys (named prey from data set NODC_split) or the unnamed preys "other", "other food", "other fish", "Clupeidae", "Gadidae" and "unknown". This assigning of partly identified preys is done using a key (data set "from_to_species.csv below") and the function make_from_to_species().

from_to <-make_from_to_species(inp_file=
        file.path(system.file( package = "FishStomachs"),"extdata","from_to_species.csv"))
from_to

The from_to data specify that first (order=1) preys identified as the group "Clupeidae" must be split proportionally to the known prey species Clupea harengus and Sprattus sprattus (if such are identified in the sample). Next (order=2) "Gadidae" is spilt into a range of gadoids. In the third step, "unknown" prey items are split into a range of prey groups, which includes both named and not named preys. This splitting og preys can be done at different strata levels, starting with sample level, sub_area level and area level.

do_detailed_output(s, by_sample_id=TRUE,to_screen=TRUE,use_criteria=FALSE,rel_weight=FALSE,
        label="Before reallocation of unindentified preys")


do_detailed_output(s, by_sample_id=TRUE,to_screen=TRUE,use_criteria=FALSE,rel_weight=TRUE,
        label="Before reallocation of unindentified preys")

# by sample
s<-redist_unidentified_prey_sp(s,dist_time=stratum_time,dist_area=sample_id,
        dist_pred_size=pred_size, do_only=c(1,2),from_to_species=from_to,
        by_prey_size=FALSE,remains_to_other = FALSE)

do_detailed_output(s, by_sample_id=TRUE,to_screen=TRUE,use_criteria=FALSE,rel_weight=TRUE,
        label="Before reallocation of unindentified preys")

The call to do_detailed_output() is changed such that it shows the full data set (use_criteria=FALSE), all data are aggregated irrespective of predator size and year and quarter ( by_sample_id=TRUE). This setting makes limited sense, however, it is made just to follow what happens.

The call to redist_unidentified_prey_sp() is quite complex, so please see help (?redist_unidentified_prey_sp).

The function redistributes partly identified preys to the distribution of the identified prey species from the same predator and predator size. The distribution key is made from an extract of data, which are filtered by the parameters dist_time, dist_area and dist_pred_size. In this example, a unique re-distribution key is made by each combination of stratum_time (year quarter combination), sample_id and predator size class. The by_prey_size=FALSE specifies that prey sizes is not used in the stratification, e.g. a Clupeida size 9999-9999 can be re-distributed to Clupea harengus with known sizes. The do_only=c(1,2) specifies that is only step number 1 and 2 in the from_to that should performed (i.e just redistribution of Clupeidae and Gadidae).

The messages from the redist_unidentified_prey_sp() say that 63 records with prey=Clupeidae were found, of which 32 could be reallocated to named Clupeidae species within the sample, but there are still 63 records where no named Clupeidae species within the sample. The unallocated records are kept as they were as parameter remains_to_other = FALSE. If set to TRUE, the remaining records would have been changed into "other".

Before the redistribution, there was Gadidae comprised 3.6 % of the total stomach contents weight. This is reduced to 2.3 % after the redistribution, while the stomach contents of the named Gadidae (Gadus morhua, Melanogrammus aeglefinus, Merlangius merlangus, Trisopterus esmarkii) have increased slightly.

The next calls to redist_unidentified_prey_sp() show how to widen the selection of data to make the allocation key from at sample level (previous example), to all samples within the rectangle, all samples within the roundfish area, and the full area of all samples. The last call make the allocation key from data by year and half year, whereas the previous calls used data by year and quarter, as that is used in the temporal stratification.

# by sub_area (rectangel)                                                         
s<-redist_unidentified_prey_sp(s,dist_time=stratum_time,dist_area=stratum_sub_area,
        dist_pred_size=pred_size, do_only=c(1,2),from_to_species=from_to,
        by_prey_size=FALSE,remains_to_other = FALSE, verbose=TRUE)

# by area (=roundfish area)                                                               
s<-redist_unidentified_prey_sp(s,dist_time=stratum_time,dist_area=stratum_area,
        dist_pred_size=pred_size, do_only=c(1,2,3),from_to_species=from_to,
        by_prey_size=FALSE,remains_to_other = FALSE, verbose=TRUE)

# all areas                                                               
s<-redist_unidentified_prey_sp(s,dist_time=stratum_time,dist_area='All',
        dist_pred_size=pred_size, do_only=c(1,2,3),from_to_species=from_to,
        by_prey_size=FALSE,remains_to_other = FALSE, verbose=TRUE)


# all areas and half-year (dist_time is changed)                                                             
s<-redist_unidentified_prey_sp(s,dist_time=paste(year,ifelse(quarter %in% c(1,2),'S1','S2')),
        dist_area='All',dist_pred_size=pred_size, do_only=c(1,2,3),from_to_species=from_to,
        by_prey_size=FALSE,remains_to_other = FALSE, verbose=TRUE)

There is still one occurrence of "Gadidae" that cannot be redistributed.

as.data.frame(subset(as.data.frame(s),prey_name=='Gadidae'))

The problem seems to be a small cod (0150-0199) which has eaten "Gadidae". Other cod within the same size class have not eaten named Gadidae (e.g. cod or whiting), so it is not possible to make a distribution key for that size class. In this case, this Gadidae is turned into "other food" in the next call to redist_unidentified_prey_sp().

# the same as above, but unallocated remains to "other food".                                                             
s<-redist_unidentified_prey_sp(s,dist_time=paste(year,ifelse(quarter %in% c(1,2),'S1','S2')),
        dist_area='All',dist_pred_size=pred_size, do_only=c(1,2,3),from_to_species=from_to,
        by_prey_size=FALSE,remains_to_other = TRUE, verbose=TRUE)

do_detailed_output(s, by_sample_id=TRUE,to_screen=TRUE,use_criteria=FALSE,rel_weight=TRUE,
        label="After reallocation of unindentified preys")

After the allocation of preys, data now includes the named species (for later use in models) and the groups "other" and "other fish" that will be converted to "other food" later. There are however still named preys with unknown sizes (9999-9999) that need to be allocated to known sizes.

Assigning size classes where missing

The assigning of size classes where missing is done in a similar way as the allocation of species name to partly identified prey items, but using the function redist_unidentifed_prey_lengths(). This function only works for the species provided by sel_preys in the control object

# a very complex way of getting a list of prey names!
keep_prey <-(read_csv(file.path(system.file( package = "FishStomachs"),"extdata","species_info.csv"),
        col_types = cols())  %>% 
        filter(prey_sp)  %>% select(species) %>% unique())$species
keep_prey #input is just a vector of prey names

s<-edit_control(s,sel_preys=keep_prey)

We can now do the re-allocation of missing size classes.

# by statum_area (=roundfish area) 
s<-redist_unidentified_prey_lengths(s,dist_time=stratum_time,dist_area=stratum_area,
      dist_pred_size=pred_size,
      remains_to_other = FALSE, # do not add records with missing length allocation to "other food"
      others_to_other=TRUE,      # do add other species that selected prey species to "other food"
      verbose=TRUE
      )

# within all areas                                                     
s<-redist_unidentified_prey_lengths(s,dist_time=stratum_time,dist_area='All',
      dist_pred_size=pred_size,remains_to_other = FALSE, verbose=TRUE)

#  within all areas, and year (ignoring quarter)                                                   
s<-redist_unidentified_prey_lengths(s=s,dist_time=substr(stratum_time,1,4),dist_area='All',
      dist_pred_size=pred_size,remains_to_other = FALSE, verbose=TRUE)

# as above, but with conversion of unallocated into "other"
s<-redist_unidentified_prey_lengths(s=s,dist_time=substr(stratum_time,1,4),dist_area='All',
      dist_pred_size=pred_size,remains_to_other = TRUE, verbose=TRUE)

do_detailed_output(s, by_sample_id=TRUE,to_screen=TRUE,use_criteria=FALSE,rel_weight=TRUE)

There was a very small amount of Merlangius merlangus that could not be allocated, and these are allocated into "other" in the last function call. The prey species are now the selected ones and "other".

Calculating population diet

The average diet should basically be calculated as a stratified mean of the individual stomach content samples, weighted by strata abundances (e.g survey cpue indices) of the predator and strata areas. FishStomachs uses three strata (ii-iv) in the calculation of population diet:

i. Sample level. stomach from the same sample (i.e. location and time). The average stomach contents at that level have been calculated in the previous examples ii. Sub-area level. ICES statistical rectangles have been used as sub-area strata. The average diet within a sub-area is in this example calculated as a weighted mean (weighted by the number of stomachs). iii. Area level. ICES roundfish areas are used for area strata. The average diet within an area is calculated a as weighted mean, weighted by the predator survey mean cpue by the sub-area (ICES rectangle) iv. Population. The average population diet is calculated from diet by area weighted by the extent of the ambient area of the area (e.g. number of ICES rectangles within a roundfish area)

The method for calculating the population is specified with options in the control object as shown below.

s<-edit_control(s,
        calc_sub_strata=list(
          # transform into relative weight before data are compiled
          relative_weight=FALSE,      
          # use number of stomach in the sample as weighting factor
          weighting_factor=expression(n_tot),
          # set file to NA when weighting_factor is given
          weighting_factor_file=NA
        ),
        calc_strata=list(
          relative_weight=FALSE,
          weighting_factor=expression(sqrt(mean_cpue)),
          weighting_factor_file=NA
        ),
        calc_total=list(
          relative_weight=FALSE,
          # set to NA when input is provided in a file
          weighting_factor=NA,
          # use data from a specified file (in those case, a file provided by the FishStomachs)
          weighting_factor_file=file.path(system.file( package = "FishStomachs"),"extdata","weighting_total.csv")
        )
)

print(attr(s,"control"),show='calc_diet')

# information from the group constants in the control object is also used.
print(attr(s,"control"),show='constants')

For each of the three strata, it must be specified whether the absolute or relative weight of the diet by strata should be used to calculate the weighted mean. The weighting factor for the for the weighted mean must either be given as an expression with R codes for variables within the STOMobs object or provided as input from a csv file. In this case, the weighting factor used to calculate the average diet per sub-area is the number of stomachs (n-tot) within each sample. The average diet at strata level uses the mean cpue variable within the STOMobs object. Finally, the population diet is calculated from the average diet by strata weighted by input values from a file. This file must contain a weighting factor ( w_fac_area) for each combination of predator name (pred_name), temporal strata (stratum_time), predator size class (pred_size) and name of the area (stratum_area). If input from files are used, it is possible to select combinations of areas and sub-areas that should be included, as only stomach data that have a match in weighing files are used in the calculation.

The input files for stratification can be made from a call to make_template_strata_weighting() using data and its control object. The call makes a csv file (if write_CSV=TRUE) and returns a data set with the template for stratification at the strata level specified.

print(attr(s,"control"),show='calc_diet')
make_template_strata_weighting(s,strata=c('sub_strata','strata','total')[3],write_CSV=FALSE)

# to show the contents of stratification file as created above (with write_CSV=TRUE)
read_csv(file.path(system.file( package = "FishStomachs"),"extdata","weighting_total.csv"),show_col_types=FALSE)

Now it is just to call thecalc_population diet function.

d<-calc_population_stom(s)
d

The population diet is stored in an object of class STOMdiet - more details in the next example.

# save data
 save(d,file=file.path(system.file( package = "FishStomachs"),"extdata","ex03d.Rdata"))


MortenVinther/FishStomachs documentation built on Feb. 14, 2025, 7:33 a.m.