The aim of this document is to outline the basic workflow of importing data downloaded from the ICES Regional Database & Estimation System (RDBES) or a list
object containing data frames (or data.tables) into R
using the RDBEScore
package.
The function createRDBESDataObject is intended to directly import Commercial Landing (CL), Commercial Effort (CE) and Commercial Sampling (CS) tables downloaded from RDBES.
RDBEScore is an R package developed to facilitate the analysis of data from the ICES Regional Database and Estimation System (RDBES). The package provides functions to:
createRDBESDataObject
and validateRDBESDataObject
filterAndTidyRDBESDataObject
, combineRDBESDataObjects
and removeBrokenVesselLinks
data from the RDBES. The package is designed to work with data from the RDBES, which is a relational database system used by ICES to store data from regional sampling programmes. The RDBEScore
package is intended to be used by national data coordinators, data analysts, and scientists working with data from the RDBES.
The package can be found on RDBEScore GitHub
The best way to get started is to read the package documentation and the vignettes. The package documentation can be found on the RDBEScore GitHub pages.
These resources provide information on how to install the package, how to use the functions, and how to contribute to the development of the package. The following will now demonstrate some of the functionality of the package that is available now at the dev branch.
The suggested way to install the package is from the main
or development (dev
) branch on GitHub using the remotes
package.
install.packages("remotes") remotes::install_github("ices-tools-dev/RDBEScore@dev", build_vignettes = TRUE)
Then you can load the package using:
library(RDBEScore)
To see the complete list of vignettes available in the package use the following command:
browseVignettes(package = "RDBEScore")
It can directly import the .zip
archive from the RDBES download containing all mandatory hierarchy tables plus VD and SL:
importedH1 <- createRDBESDataObject(input = "./vignettes/vignetteData/H1_2023_10_16.zip") #print the not NULL table names names(importedH1[!unlist(lapply(importedH1, is.null))])
For this to work the zip file should contain these tables in the root of the zip file.
The easiest way to get a glimpse of the imported data hierarchy and single table row counts is just to print it. The information also includes the range of number sampled and number total for each table together with the selection method and number of rows.
#calls the print function
importedH1
It can import the CL, CE, VD or SL tables .zip
archives, but will include all other tables as NULL
:
importedSL <- createRDBESDataObject(input = "./vignettes/vignetteData/HSL_2023_10_16.zip") #print the not NULL table names importedSL
It can also import a list
object containing data frames (or data.tables). However, it should be noted that this type of import bypasses the RDBES upload data integrity checks.
#list of data frames listOfDfsH1 <- readRDS("./vignettes/vignetteData/H1_2023_10_19.rds") importedList <- createRDBESDataObject(listOfDfsH1)
It should be noted that the objects created are of the S3 class "RDBESDataObject". The class has defined print(), summary() and sort() methods. For more info on these see vignette Manipulating RDBESDataObjects.
RDBESDataObject structure can be validated using the validateRDBESDataObject() function.
validateRDBESDataObject(importedList, verbose = TRUE)
RDBESDataObjects can be filtered using the filterRDBESDataObject() function - this allows the RDBESDataObject to be filtered by any field. A typical use of filtering might be to extract all data collected in a particular ICES division.
myFields <- c("SDctry","VDctry","VDflgCtry","FTarvLoc") myValues <- c("ZW","ZWBZH","ZWVFA" ) myFilteredObject <- filterRDBESDataObject(H1Example, fieldsToFilter = myFields, valuesToFilter = myValues ) # Number of rows in each non-null table unlist(summary(myFilteredObject)$rows) validateRDBESDataObject(myFilteredObject, verbose = FALSE)
It is important to note that filtering is likely to result in "orphan" rows being produced so it is usual to also apply the findAndKillOrphans() function to the filtered data to remove these records.
myFilteredObjectNoOrphans <- findAndKillOrphans(objectToCheck = myFilteredObject, verbose = FALSE) validateRDBESDataObject(myFilteredObjectNoOrphans, verbose = FALSE)
Again to see more details on the functions see the vignette Manipulating RDBESDataObjects.
Sometimes it we want to see how a field or values in the RDBESDataObject are connected to other tables. One use case would be e.g. to see when a specific Landing Event (LE) occurred.For this we can use the getLinkedDataFromLevel() function.
#get the TE table corresponding to the first LEid in the H8ExampleEE1 object ld <- getLinkedDataFromLevel("LEid", c(1), H8ExampleEE1, "TE", verbose = TRUE) knitr::kable(ld[,c(1:2,5:8 )])
Similarly we can get the subset of the LE table corresponding to a specific value in the TE table. This does not have to be the id field, but can be any field in the table.
#get the SA table corresponding to the first 2 TEids in the H8ExampleEE1 object ld <- getLinkedDataFromLevel("TEid", c(1,2), H8ExampleEE1, "SA", verbose = TRUE) knitr::kable(ld[,1:5])
Also lower hierarchy tables can be used to get the subset of the higher hierarchy tables.
#which vessel caught those fish? ld <- getLinkedDataFromLevel("BVfishId", c("410472143", "410472144"), H8ExampleEE1, "VS", TRUE) knitr::kable(ld[,1:5])
The RDBEScore package aims to provide a set of functions to estimate values from the RDBES data. The development of the estimation functions is ongoing and the current version provides functions to estimate values using the Multiple Count Estimator (MCE) for the upper hierarchies.
To estimate the last level values for a single FMid so that estimation is done for one top level record
FMidSel <- "4033243" BV <- H1Example$BV[H1Example$BV$FMid == FMidSel,]
This estimMC(...) function is actually the core of the estimation functions running on multiple levels as well. For implementation details see: Variance calculation functions using "Multiple count" estimator
estimMC(as.numeric(BV$BVvalueMeas), BV$BVnumSamp, BV$BVnumTotal, method=unique(BV$BVselectMeth))
Right now estimation actually is done on a RDBESEstObject that is generate from RDBESDataObject using createRDBESEstObject(...).
In the next sections we will use data from R packages survey and SDAResources that are converted into the RDBESDataObject to demonstrate the estimation procedure.
For more detailed information on the estimation functions see the vignette Estimating Population parameters from RDBESDataObjects.
#create the estimation object to estimate values on the SA table estObj <- createRDBESEstObject(Pckg_SDAResources_agstrat_H1, 1, "SA")
res <- doEstimationForAllStrata(estObj, "SAsampWtMes") # Get the estimated total and mean for "SAsampWtMes" for the VS stratum "NC" columns2Get <- c("est.total","est.mean", "se.total","se.mean") round(unlist(res[res$recType == "VS" & res$stratumName == "NC" ,columns2Get]),1)
How to interpret these results in the above example?
In practice the data submitters have to prepare Intercatch data call tables. In the future the estimation functions should be made so that they estimate the values for these tables.
in the dev branch there is a function doBVestimCANUM(...) that can be used for a very basic estimation of the total catch at number (CANUM) for a specified biological variable, such as age or length.
For the simplest case of estimation we need the RDBESDataObject with CS tables and a CL table. In the following example we will estimate the total number of sprat caught in the area 27.3.d.28.1 with the gear OTM_SPF_16-31_0_0 for the first quarter of the year.
#From the commertial landings table we need to get the total weight of the catches CLfieldstoSum <- c("CLoffWeight")
The most important thing in this estimation is to get the same strata for the CS and CL tables. This means we want to take the samples from the same area, with the same gear and the same species. Exactly how this is done depends on the upper and lower hierarchy used and how the sampling is stratified. In the following example we are using the lower hierarchy C meaning that we are extracting the BV data as the biological data.
#get the first quarter data from CS strataListCS <- list(LEarea="27.3.d.28.1", LEmetier6 = "OTM_SPF_16-31_0_0", TEstratumName = month.name[1:3], SAspeCodeFAO = "SPR") #get the first quarter data from CL table strataListCL <- list(CLarea="27.3.d.28.1", CLquar = 1, CLmetier6 = "OTM_SPF_16-31_0_0", CLspecFAO = "SPR")
There is a function in development addCLtoLowerCS(...) that can be used to add the CL data to the lower hierarchy.
#we are using the lower hierarchy C meaning that we are extracting the BV data #as the biological data biolCLQ1 <- addCLtoLowerCS(H8ExampleEE1, strataListCS, strataListCL, combineStrata =T, lowerHierarchy = "C", CLfields = CLfieldstoSum)
To estimate the total number of sprat caught in the area 27.3.d.28.1 with the gear OTM_SPF_16-31_0_0 for the first quarter of the year we need to use the function doBVestimCANUM(...). For more details on the function see the vignette in the dev branch. The output table should be enough to populate a classic InterCatch data call table.
lenCANUMQ1 <- doBVestimCANUM(biolCLQ1, c("sumCLoffWeight"), classUnits = "Lengthmm", classBreaks = seq(70,130,10), verbose = FALSE) knitr::kable(lenCANUMQ1[, c("Group", "WeightgMean", "LengthmmMean", "totNum")], digits = 2)
If something is not working as expected, or if you have a feature request, please open an issue on RDBEScore GitHub
The general idea of the package is to provide a set of tools to work with the RDBES data in R. The package is under development and we are looking for feedback from users to improve the package.
Of course you can also contribute to the development of the package by forking the repository and submitting a pull request.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.