knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

This R package is dedicated to the construction of fishery-dependent data container objects. The mechanism to build up such containers is generic in order to facilitate the answers to the fishery-dependent data calls and to tackle the issues of providing the same information in different formats. The construction ensures intrinsically the quality of the data these objects contain. In this framework, we address the definition of data quality in the first part of this document. The document provides some practical examples of the construction of specific data containers.

Framework

Fishery data are usually collected at the national level, following national working plans and using ad-hoc infrastructure and database format. Upon agreement, these data are then transmitted to the Regional Fishery Management organizations (RFMO). RFMO's data calls define the type of data countries have to provide, the form these pieces of information have to follow, the way how the data are transmitted and the deadlines.

Fishery-dependent data

This document focus only on fishery-dependent data: data that are collected from the fishery, not the one collected during the scientific surveys or other activities not based on fishermen activities. A complete description of the information collectable from the fisheries is outside the scope of this document. This document will focus mainly on fishery characteristics (vessels characteristics, fishing areas, period of fishing, metiers used, fishing effort), the catches (landings and discards of the species), and the population descriptors of the catches: the numbers at length, the numbers at age and some biological parameters (maturity, sex...) by species.

Data call

The fishery-dependent data are one of the inputs of stock assessment models. Besides this critical use, these data by itself provide information about the fisheries states and behaviours: a standard prerequisite to the understanding of the fishing pressures on the stock and its influence on the dynamics of the stock. The demands on fishery-dependent data by RFMO has increased in the past few years. For example, the data calls issuing by the International Council of the Exploration of the Seas (ICES) have increased by three for the stocks identified as data-limited in 2016 and 2017. For such stocks, the data provided has to cover three years instead of one, logically multiplying the processing time of the information by three. Moreover, differents RFMOs can ask for the same information in different formats. It is the case for the European Countries having fishing fleets in the Mediterranean Sea: both the General Fishery Commission of the Mediterranean (GFCM) and the Joint Research Center (JRC, a research institution belonging to the European Union) request the same fishery-dependent data for these countries but follow different data calls (deadline and file formats differ profoundly).

Data quality

The quality of a dataset is a vague concept. A definition of this concept has to mix philosophical and practical considerations. Philosophy helps to understand the link between reality (the fishery activities and its impact on stocks) and the data representing this process. Floridi's paper (2005) discuss the link between semantic information and meaningful data in a general way. Practical considerations consider first the real use of the data collected. Wang and Strong (1996) define "data quality" as data that are fit for use by data consumers. Between these two extremes views, this document relies on the data quality framework proposed by the GFCM (GFCM, 2019). This choice does not tell that this framework is better than another one, but it has the merit to be a complete one available for fishery data. Data quality indicators, their definitions and the corresponding practical implementation are defined pages 82 and 83 of the document. They are seven indicators we redefined here in a broader sense:

The timeliness, completeness and adequacy indicators will be not considered in this document. Their assessments are particular to the data calls and ask for extra information not belonging to the construction of the data container. However, considering the "fit for use" proposed by Wand and Strong (1996), they are probably the most important in term of quality: if no, partial or inadequate data are provided to answer a data call, what is the meaning of the related data collection?
The conformity, stability, consistency and accuracy indicators are data quality indicators which can be verified during the elaboration of the data at the national level. In these indicators, the conformity is an indicator intrinsically linked to the data container. For example, the conformity data quality indicators who checks if a landing value is a positive number, means that the column of the data table containing the landing value has to be defined as a positive number. If the data container of this data table is well defined, nothing else than a positive number can be entered by the user for this variable. This document will focus on the construction of data containers conform by definition. Shorter examples will illustrate the implementation of some of the three other indicators (stability, consistency and accuracy). These indicators are less critical in term of data quality. Stability indicator signals a drastic change of a parameter, but this change can be a real change in the fishery dynamic. This indication can be interesting to point out for the end-user, as stock assessments models can be sensitive to a significant variation in their input, but per se we consider stability as a secondary step in data quality assessment. The consistency indicator is dependent on the format of the data calls. It has to be checked in the data calls where redundant information is requested. For example, if catches, landings and discards are all requested, the total catches have to be equal to the sum of the total landings and the total discards. This example seems to be trivial, but at the national level, the data calls answers can involve different institutes (to complete the previous example, institute A providing the discards estimates, and institute B the census of the landings), and this kind of check is rarely done across different institutes. The accuracy indicator covers a large body of practice in statistics. As the stability indicator, its implementation can lead to false positive, because the definition of what is a realistic value of fishery parameters in the context of a given data call is open to discussion. However, like stability, this indicator can inform the user to some errors or change in the fishery or stock dynamic, and but per se we consider accuracy as a secondary step in data quality assessment.

Data format in fishery science

The need to have a versatile tool able to build different data container for the same information relies on the multiplicity of the data format requested by the RFMOs. A data format is a formal definition of how the data are transmitted to the RFMOs. The definition includes the files format and the codification of the data in these files. For European Countries, according to the distribution of the stock their fleets target, a short list could be:

Other RFMOs exist (CCAMLR, SPF...) and are not considered here, as they are less concerned by European fleets. The main RFMOs (ICES, JRC, GFCM) for the European countries lead to the manipulation of at least 4 data format to answer the regular data call. All these formats are documented:

An interesting common feature of these documents is their evolution in time. All of them have experienced during the past three years (FDI, GFCM) or are experiencing significant changes (ICES). These significant changes regard the basic structure of the files exchanged with the countries, and on a less critical level, these changes also concern the definitions of the reference list of some individual parameters. As an example, the interested reader can compare the data structure described in Jansen et al. (2016) with the one in development in ICES 2018a.

To conclude this short view on the fishery-dependent data, the increasing of the data calls and the instability on the data formats requested by the RFMO highlight the need to develop a tool able to cope with these constraints, with the guarantee of ensuring the quality of the data transmitted to the RFMOs.

\newpage

Rationale

The construction principle of fishery-dependent data containers is illustrated using the Fishframe data format (see Fig. 1 from Jansen et al. 2016).

Fishframe data format

\newpage

The careful inspection of this current format highlights essential properties:

These comments apply to other RFMOs data formats. In summary, time and space references are essential information, and other fishery objects information (fishing vessel description, characteristics of a haul, sampling reference) are requested by all the data calls. Inside each data calls, the labelling and the aggregation of the information can change. For example, the temporal unit of GFCM data call is the year, while for ICES Intercatch related data it can be the month, the quarter or the year. For fixed information, the codification can change while the parameter conveys the same information. The metier lists are not the same for ICES, GFCM or FDI data call but a trawler of 25 meters long remains the same trawler even if its name is OTB_DEF_70-99_0 for ICES, T12 for GFCM and TRAWL/OTB/70D100 for the FDI data call.

Considering these properties, the main task to translate national data to data calls (apart from the estimation work that this document do not address) is a translation work in time and space plus a renaming of the information properties using a different convention. To do this, this package provides some generic data containers for time, for space and for general fishery object. These data containers embed some strict definition of the data types and ensure the conformity of the information intrinsically.

Data containers and S4 R class

The data containers are build using S4 class objects. Advanced knowledge of R and an understanding of the oriented-object OO programming can be helpful to understand the way the data containers are built and translated into data files conform to the data-calls specification. The book of H. Wickham (2014) covers all these topics. The S4 objects in R have fundamental properties related to the fishery data container: they have a formal definition to ensure conformity and inheritance properties who simplify the construction of complex data containers.

Information

The data container has to record information at its smallest resolution. If a trip is located in space by the vessel trajectory, the data container will contain this trajectory. If for another trip, only the ICES division is known, then this information is recorded in the data container. This consideration is the same for the time, and for all the other fishery information: one record by parameters at the smallest resolution possible. The R classes and objects will then ensure the translation of the data into the data-calls requirements using methods and transparent reference lists and algorithms.

An operational example

A data-call asks for the monthly total landings for all the fishing areas where vessels operate. To answer this data-call, the creation of an object Landing containing (1) the landings values, (2) the corresponding dates (in this case the months) and (3) areas where fishing occurs. The object Landing contains three entities:

The objects (2) and (3) refer only to the time and the spatial characteristics of the landing events. They are built independently, and their conformity checked during the building process. The same construction principles apply for the object (1): in this simple example, this object is a positive numerical vector. To build the final Landing object, the objects (1), (2) and (3) are associated together, in a way they keep their internal validations mechanisms. The Landing object inherits the conformity checks and the objects types from which it was built. In the R language, the definition of the objects, their validities and their inheritance are based on the class definition, here the S4 classes.

\newpage

The elementary classes

The classes identified as elementary are presented in the next section. They are implemented in the present package, but their implementations are subject to change according to the user needs and the data-calls scopes. We believe that the Time and the Space classes are generic enough to be needed across every data calls.

The library is then loaded:

library(CLEFRDB)

\newpage

The Time class

Temporal references are defined by a type (TimeType) and a date (TimeDate).

new("Time")

The type provides the temporal resolution at which the information is recorded, and the date the central date following the POSIXct format of the temporal event. Five TimeType are available:

print(Timetype)

corresponding respectively to annual, quarterly, monthly, daily and date with time events.

The definition of a Time object for a regular date (a nightly haul in May 2011 for example) is:

hauldate<-as.POSIXct(strptime("2011-03-27 01:30:03",
               "%Y-%m-%d %H:%M:%S"))
new("Time",TimeType="date",TimeDate=hauldate)

Why adding such complex definition of a temporal event when adding column for year, month, day or quarter could be more straightforward ? For two reasons:

To illustrate these properties, let see a simple example with a set of four differents temporal events recorded at different time scales:

haultime<-c("2011-03-27 01:30:03",
        "2011-03-27 12:00:00",
        "2011-03-15 12:00:00",
        "2011-02-14 00:00:00")
haultime<-as.POSIXct(strptime(haultime,
               "%Y-%m-%d %H:%M:%S"))
haultime<-new("Time",TimeType=c("date","day","month","quarter"),
    TimeDate=haultime)
print(haultime)

The four hauls date are recorded respectively at the date, the day, the month and the quarter scale. The corresponding dates are the dates at the middle of the day, the month and the quarter. If a data-call request date in year, the conversion is easy to compute:

substr(haultime@TimeDate,1,4)

More complex time manipulations are available using the lubridate R package.

Data quality

The conformity of the Time object is checked during the object construction. A Time class object is checked:

These commands will generate an error:

new("Time",TimeDate=c(NA),TimeType=c("day"))
new("Time",TimeDate=.POSIXct(c(NA,NA)),TimeType=c("day","day"))
new("Time",TimeType="date",TimeDate="2011-03-27")
hauldate<-as.POSIXct(strptime("2011-03-27 01:30:03",
               "%Y-%m-%d %H:%M:%S"))
new("Time",TimeDate=hauldate,TimeType=c("a day"))
new("Time",TimeDate=hauldate)

This behaviour is not user-friendly, but simply asks the user to be consistent for a very simple field (time !). Adressing formatting error at the very beginning of the data preparation is the best way to propagate conformity quality inside a dataset.

An example

If a data-call request a data table containing the time series of the total landings. The S4 object related to this demand can be defined using the Time class as the container for the temporal definition, adding a new slot for the landings values. In this case, the new object class named Landings will inherit the Time class properties. The conformity check and the other methods belonging to the Time class will be transmitted to the new Landings object. The function initinherit of this package copies the parameters from one object to another based on their typology. For the Landings class containing the Time class, the Time parameters will be copied inside the Landings object.

#a Time object related to four haul
        haultime<-c("2011-03-27 01:30:03",
            "2011-03-27 12:00:00",
            "2011-03-15 12:00:00",
            "2011-02-14 00:00:00")
        haultime<-as.POSIXct(strptime(haultime,
                      "%Y-%m-%d %H:%M:%S"))
        haultime<-new("Time",TimeType=c("date","day","month","quarter"),
                          TimeDate=haultime)
# the value of the total landings of the four hauls
w<-c(10000,3000,2000,10)
#Definition of the Landings class
setClass(Class="Landings",
         slots=c("landings"="numeric"),
         contains=c("Time"),
         prototype=prototype(Landings=numeric(),
                             Time=new("Time"))
         )
#a new Landings object
new("Landings")
#a new Landings object initialized with the previous values
initinherit(new("Landings"),landings=w,Time=haultime)

Space

The Space class follow the same class structure of the Time class. Spatial references are defined by a type (SpaceType) and a place (SpatialPlace).

new("Space")

The type provides the spatial type at which the information is recorded, and the place the name of the spatial area corresponding to the type Both paramters are character vector Four SpaceType are available:

print(Spacetype)

corresponding respectively to the ICES division (e.g. 27.7.g), the ICES rectangle (e.g. 50H6), the harbour in UNLOCODE (e.g. DEBRB) and the GSA area (e.g. GSA07). The object defspace of class sf gives the spatial geometry of these differents spatial entities using simple features ISO standard definition.

head(defspace)

A plot of this object shows a map of all the geometry available.

plot(defspace[1:10,"type"],axes=T)

The contruction of a simple Space object containing nine spatial objects (three ICES divisions, three ICES rectangles, two GSA areas and a harbour) is:

tripgeo<-new("Space",SpacePlace=c("27.7.f","27.7.g","27.7.h",
                  "27E8","27F0","28F0",
                  "GSA07","GSA08",
                  "FRRTB"),
          SpaceType=c("ICESdiv","ICESdiv","ICESdiv",
                  "ICESrect","ICESrect","ICESrect",
                  "GSA","GSA",
                  "harbour"))
print(tripgeo)

The plot method attached to the Space class map the geometry of the resulting object:

map(tripgeo,axes=T)

Validation mechanisms are operating on the SpacePlace and SpaceType slots. The following commands will return an error.

new("Space",SpacePlace=c("27.7.h","GSA078"),
            SpaceType=c("ICEdiv","GSA"))
new("Space",SpacePlace=c("27.7.h","GSA07","FRRTB","DEBRB"),
            SpaceType=c("ICESdiv","GSA","harbour"))

An example

To illustrate the Space class use in an operational example, the previous example is extended. The data-call demands now the landings located in time and space. To define the new object, as previously, a Time object is build related to the date at which the four hauls were made, and a Space object referencing the areas where the hauls occur is added. Then the new object is defined as a new S4 class containing the landings values (the landings slot), the date of the hauls (the Time slot) and their positions (the Space slot). The values of the haultime and haulspace are copied in the new Landings object using the initinherit function. There is no need to check the conformity of the haultime and haulspace objects; this property was checked during the construction.

#a Time object related to four hauls
        haultime<-c("2011-03-27 01:30:03",
            "2011-03-27 12:00:00",
            "2011-03-15 12:00:00",
            "2011-02-14 00:00:00")
        haultime<-as.POSIXct(strptime(haultime,
                      "%Y-%m-%d %H:%M:%S"))
        haultime<-new("Time",TimeType=c("date","day","month","quarter"),
                          TimeDate=haultime)
# the value of the total landings of the four hauls
w<-c(10000,3000,2000,10)
#the area where the hauls were located
haulspace<-new("Space",SpacePlace=c("27.7.f","27.7.g",
                  "27E8","GSA07"),
          SpaceType=c("ICESdiv","ICESdiv",
                  "ICESrect","GSA"))
#Definition of the Landings class
setClass(Class="Landings",
         slots=c("landings"="numeric"),
         contains=c("Time","Space"),
         prototype=prototype(landings=numeric(),
                             Time=new("Time"),
                             Space=new("Space"))
         )
#a new Landings object
new("Landings")

#a new Landings object initialized with the previous values
haullanding<-initinherit(new("Landings",landings=w),Time=haultime,Space=haulspace)
print(haullanding)
map(haullanding)
#dimclass(haullanding)

Regarding the conformity checks of the Landings object, one issue remains. The initinherit function checks if the numbers of objects in each class are the same, and the conformity of the Time and Space objects was checked during their constructions, but the condition of having only positive values in the landing slot is not test. To implement this new conformity check, a validity function can be define and associated to the Landings cass. This function checks for an equal number of objects in each class, and test if the landings are positive values. For a more advanced discussion on how to define this validation mechanism, see the R's help about the setValidity method.

#validity of the landings object
validLandings<-function(object){
    if(any(object@landings<0)){print("negative landings !!");FALSE}else{TRUE}
}
#associate the validation function to the landings object
setValidity("Landings",validLandings)
#the original object is still valid
initinherit(new("Landings"),landings=w,Time=haultime,Space=haulspace)
#a negative value in the landings slot will throw an error
## w<-c(10000,3000,-2000,10)
## initinherit(new("Landings",landings=w),Time=haultime,Space=haulspace)

Application

In this section, two examples are provided. The first example show how to generate object from the one available in this package, and some new objects to generate the table TR in the Fishframe format (ICES, 2018b). The second example will use these objects to generate the Fishing Trip table of the new RDBES format (ICES, 2019).

The table Trip in Fishframe

This table contains information regarding the fishing trip (identifier of the trip, number of hauls, days at sea, harbour of landing), the vessel (identifier, flag, length, power, size, type), and the sampling (type, year, method, sampling country, project name). To complete the information needed in this table, the class Space and Time will be used to provide the harbour and the year, while the Vessel and Sampling classes are built from scratch to provide the requested information.

A Vessel class

The Vessel class contains all the information related to the vessel. In the six slots of this class, the vessel identifier, its flag, type, length, power and size are recorded. To lighten this example, the conformity of the object will be not adressed here. Only the type of the slot (character, integer, numeric...) are checked. To validate the conformity of the object, the previous example shows how to define a validation function and to associate it to the object. The information of a fictional vessel populate this new object.

#Vessel class
setClass(Class="Vessel",
        slots=c(VesselId="character",
                VesselFlag="character",
                VesselType="character",
                VesselLength="integer",
                VesselPower="integer",
                VesselSize="integer"
                ),
        prototype=prototype(VesselId=character(),
                            VesselFlag=character(),
                            VesselType=character(),
                            VesselLength=integer(),
                            VesselPower=integer(),
                            VesselSize=integer()),
        )
#a fictional vessel
myVessel<-new("Vessel", 
          VesselId="AAAA123",
          VesselFlag="FRA",
              VesselType=c("Trawler"),
          VesselLength=as.integer(20),
          VesselPower=as.integer(5000),
              VesselSize=as.integer(1000))

A new class Sampling is defined following the same approach.

#Sampling class"
setClass(Class="Sampling",
     slots=c(SamplingId="character",
         SamplingType="character",
         SamplingMethod="character",
         SamplingProject="character",
         SamplingCountry="character"),
     prototype=prototype(SamplingId=character(),
                 SamplingType=character(),
                 SamplingMethod=character(),
                 SamplingProject=character(),
                 SamplingCountry=character())
     )

#a fictionnal sample
mySampling<-new("Sampling",
        SamplingId="SAMP2199",
        SamplingType="S",
        SamplingMethod="Observer",
        SamplingProject="SamplingProjectFRA",
        SamplingCountry="FRA")

To complete the trip information, two Time and wto Spatial objects are defined for this trip. They refer to the location and the date of the beginning and the end of fishing trip:

mySpaceDep<-new("Space",SpacePlace=c("FRCOC"),SpaceType=c("harbour"))
mySpaceArr<-new("Space",SpacePlace=c("FRCOC"),SpaceType=c("harbour"))
myTimeDep<-new("Time",
        TimeDate=as.POSIXct(strptime("2011-03-15 12:16:00","%Y-%m-%d %H:%M:%S")),
        TimeType="date")
myTimeArr<-new("Time",
        TimeDate=as.POSIXct(strptime("2011-03-19 04:00:32","%Y-%m-%d %H:%M:%S")),
        TimeType="date")

Then, a new class corresponding to the Trip table is defined:

# a new Trip class
setClass(Class="Trip",
        slots=c(nbhaul="integer",
                daysatsea="integer",
        TripId="character",
        TimeDep="Time",
        TimeArr="Time",
        SpaceDep="Space",
        SpaceArr="Space"
                ),
        contains=c("Vessel",
                "Sampling"
                ),
         prototype=prototype(TripId=character(),
                 nbhaul=integer(),
                             daysatsea=integer(),
                             Vessel=new("Vessel"),
                             Sampling=new("Sampling"),
                             TimeDep=new("Time"),
                             TimeArr=new("Time"),
                             SpaceDep=new("Space"),
                             SpaceArr=new("Space")
                             )
)

myTrip<-initinherit(new("Trip",
            TripId="FR872",
            nbhaul=as.integer(12),
            daysatsea=as.integer(5)),
            Sampling=mySampling,
            Vessel=myVessel,
            TimeDep=myTimeDep,
            TimeArr=myTimeArr,
            SpaceDep=mySpaceDep,
            SpaceArr=mySpaceArr)

Then, the export this object into the Trip format requested by Fishframe is only a matter of formatting (we suppose that all the field were checked for conformity, with the right validation function associated to each object). Here an example of such formatting:

FishframeTrip<-data.frame(`Record type`="TR",
              `Sampling type`=myTrip@SamplingType,
              `Landing country`=myTrip@SamplingCountry,
              `Vessel flag country`=myTrip@VesselFlag,
              `Year`=substr(myTrip@TimeArr@TimeDate,1,4),
              `Project`=myTrip@SamplingProject,
              `Trip code`=myTrip@TripId,
              `Vessel length`=myTrip@VesselLength,
              `Vessel power`=myTrip@VesselPower,
              `Vessel size`=myTrip@VesselSize,
              `Vessel type`=myTrip@VesselType,
              `Harbour`=myTrip@SpaceArr@SpacePlace,
              `Number of hauls`=myTrip@nbhaul,
              `Days at sea`=myTrip@daysatsea,
              `Vessel identifier`=myTrip@VesselId,
              `Sampling country`=myTrip@SamplingCountry,
              `Sampling method`=myTrip@SamplingMethod)

The table Fishing Trip in the RDBES

According to the Trip object, the formatting for the Fishing Trip table of the RDBES is done using the same information. Some parameters are not available in the original Trip object, but they can be added by the user according to its needs. For example the probability of sampling can be added to the Sampling object in order to complete the information requested by the RDBES. The export with the original Trip object to the RDBES format is:

RDBESTrip<-data.frame(`FishingTripID`=myTrip@SamplingId,
        `OnShoreEventID`="",
        `VesselSelectionID `="",
        `VesselDetailsID `=myTrip@VesselId,
        `SampleDetailsId`=myTrip@SamplingId,
        `Record type`="FT",
        `National Fishing trip Code`=myTrip@TripId,
        `Stratification`="",
        `Trip Stratum`="",
        `FishingTripsClustering`="",
        `FishingTripsClusterName`="",
        `Sampler`="",
        `NumberOfHauls`=myTrip@nbhaul,
        `Departure location`=myTrip@SpaceDep@SpacePlace,
        `Depature date`=substr(myTrip@TimeDep@TimeDate,1,10),
        `Depature time`=substr(myTrip@TimeDep@TimeDate,12,19),
        `Arrival location`=myTrip@SpaceArr@SpacePlace,
        `Arrival date`=substr(myTrip@TimeArr@TimeDate,1,10),
        `Arrival time`=substr(myTrip@TimeArr@TimeDate,12,19),
        `FishingTrips Total`="",
        `FishingTrips Sampled`=1,
        `FishingTripSampelProbability`="",
        `Selection method`="",
        `Selection Method Cluster`="",
        `FishingTripsTotClusters`="",
        `FishingTripsSampClusters`="",
        `FishingTripsClustersProb`="")

Conclusion and future work

s R library proposes an implementation of classes of objects currently used by fishery-dependent data. The definition of S4 classes and its strict validation mechanism lead to generating objects which strictly follow the conformity of their description (variable type, numerical range, compliance to a lookup table...). The inheritance among S4 objects gives to the user the ability to build more complex objects from smaller objects, conservating the conformity of these smaller objects. These objects have no prerequisite in term of format: they can be created from the national data available in each country. Then exporting these objects into the multiple formats requested by the different RFMOs is straightforward, and these objects, by construction, will pass the conformity checks implemented on the various upload facilities.

If the two examples presented in this document generate raw sampling dataset, other data-call request elevated data at the population level (GFCM, ICCAT, ICES Intercatch). So there is a strong need for methods associated with these object to raise the sampling data to the population level. Future work will implement such methods including generic statistical sound sampling elevation methods, from ratio-estimators to probability-based sampling estimators. The main difficulty will be to free the elevation methods from any data format. A significant benefit should be the smooth transmission and adaptation of these methods to the RDBES format, and possibly to other fishery-dependent data formats.

\newpage

Bibliography



ldbk/CLEFRDB documentation built on June 2, 2019, 9:07 p.m.