see (see Figure \@ref(fig:allhauls))
library(tidyverse) library(lubridate) library(sf) library(icesDatras) library(tidyices) # devtools::install_github("einarhjorleifsson/gisland", dependencies = FALSE) library(gisland) # devtools::install_github("ropensci/worrms") library(worrms)
With the advent of the DATRAS webserver one has for some time been able to access the the DATRAS data programmatically rather than through the point-and-click DATRAS download facilities. The icesDatras-package allows one to download and access the data directly in R via the getDATRAS
function. The function is (currently) limited to only accessing one survey at the time, but multiple years and quarters can however be specified. In addition in its current form the haul dataframe (HH), the length data (HL) and the age data (CA) have to be called separately (TODO: Write a convenience function process this with only one function call?). The following code shows how one can access all the quarter 1 and 3 NS-IBTS data, here stored as a list in an R-binary file for later processing.
# not run yrs <- 1965:2018 qts <- c(1, 3) hh_raw <- getDATRAS(record = "HH", survey = "NS-IBTS", years = yrs, quarters = qts) hl_raw <- getDATRAS(record = "HL", survey = "NS-IBTS", years = yrs, quarters = qts) ca_raw <- getDATRAS(record = "CA", survey = "NS-IBTS", years = yrs, quarters = qts) raw <- list(hh = hh_raw, hl = hl_raw, ca = ca_raw) write_rds(raw, file = "data-raw/datras/ns-ibts_raw.rds")
The "exchange" data do not strictly fall under the umbrella of tidy dataframes (see: Tidy Data) - @wickham2014tidy. To make all subsequent coding more streamlined an a priori re-coding of the exchange dataframes is hence warranted.
To load the data downloaded earlier one can read in the data by:
raw <- read_rds("data-raw/datras/ns-ibts_raw.rds")
As stated above, the data is stored as a list. Lets check the names of the object:
names(raw)
The names refer to the separate dataframes, the haul data (hh), the length data (hl) and the age data (ca). Each of the dataframes can be viewed by running the following code (not run):
raw$hh %>% glimpse() raw$hl %>% glimpse() raw$ca %>% glimpse()
cn <- c(colnames(raw$hh), colnames(raw$hl), colnames(raw$ca)) cn.unique <- cn %>% unique()
The three dataframes have in total r length(cn)
columns but thankfully a lot of them are a repeat and hence redundant. Additional processing that is done here is explained in details below. At this moment it is not necessary for the novice user of DATRAS data to dig too much into the details of the tidying process, this could be revisited at a later stage. For those that have used the "exchange" data before, some readings into the details may be kosher.
The haul dataframe contains one record per haul and is in a relatively tidy format. The total number of variables (61) may though be a bit overwhelming for routine abundance estimates. The function tidy_hh
does the following:
tidy_hh
-function can be set to TRUE (the default setting is FALSE).hh <- raw$hh %>% tidy_hh()
We are often interested in analyzing the data by ICES areas, hence one can add the a variable (here called faoarea) to the haul data, based on the shooting coordinates (link to some details of the geo_inside-function):
# Read in the FAO area from the web: fao <- read_sf_ftp("fao-areas_nocoastline") %>% as("Spatial") # Find the faoarea attribute of each shooting location hh <- hh %>% mutate(faoarea = geo_inside(shootlong, shootlat, fao, "name"))
TODO - MOVE THIS INTO A LATER SECTION: In this booklet the example code is largely based on the NS-IBTS data. In some of the steps the "[Roundfish area]{#nsrf}" is used. A new numerical variable is created that contains the numerical code of the "Roundfish area" that the haul belongs to by:
ns_area <- read_sf_ftp("NS_IBTS_RF") %>% as("Spatial") hh <- hh %>% mutate(nsarea = geo_inside(shootlong, shootlat, ns_area, "AreaName") %>% as.integer())
The exchange format of the length related measurements is a bit messy. E.g.:
A convenient functions, tidy_hl
basically takes care of the above, it specifically doing:
species
-tableIn order for this to complete its job, we need in addition to the raw hl-dataframe to pass the tidy haul-dataframe (because that is where the variable DataType is stored). And to convert the coded species information to Latin name we need to supply the proper "lookup" table (TODO: provide link the the auxiliary chapter explaining how one can obtain this from scratch rather via the temporary csv file):
species <- read_csv("ftp://ftp.hafro.is/pub/reiknid/einar/datras_worms.csv") hl <- raw$hl %>% tidy_hl(hh, species)
So starting with the raw length dataframe containing 27 variables what is returned is a dataframe that contains only 5 variables, the haul id, the species name (latin), sex, length (in centimeters) and the number of fish measured (n).
hl <- read_rds("data/ns-ibts_hl.rds")
glimpse(hl)
NOTE: Sometimes only counts are in the hl-data (length NA), sometimes total weight etc. Need to cover that if possible and if not drop those records in the tidying.
.. draft to be written
ca <- nsibts_raw$ca %>% tidyices::tidy_ca(species)
hh %>% write_rds("data/ns-ibts_hh.rds") hl %>% write_rds("data/ns-ibts_hl.rds") ca %>% write_rds("data/ns-ibts_ca.rds")
The above code shows how to obtain the haul, length and age data for one survey. Since we may be interested in looking at more than one survey, the code below describes how to get all the survey data stored in the DATRAS database via a loop-script:
# Get an overview of all the surveys dtrs <- icesDatras::getDatrasDataOverview() # Loop through each survey, download and save ---------------------------------- for(i in 1:length(dtrs)) { sur <- names(dtrs[i]) print(sur) yrs <- rownames(dtrs[[i]]) %>% as.integer() qts <- c(1:4) # A error occurs in the NS-IBTS if all quarters are requested if(sur == "NS-IBTS") qts <- c(1, 3) hh_raw <- icesDatras::getDATRAS(record = "HH", survey = sur, years = yrs, quarters = qts) hl_raw <- icesDatras::getDATRAS(record = "HL", survey = sur, years = yrs, quarters = qts) ca_raw <- icesDatras::getDATRAS(record = "CA", survey = sur, years = yrs, quarters = qts) list(hh = hh_raw, hl = hl_raw, ca = ca_raw) %>% write_rds(path = paste0("data-raw/datras/", tolower(sur), "_raw.rds")) }
# Make sure the needed objects are available fao <- gisland::read_sf_ftp("fao-areas_nocoastline") %>% as("Spatial") ns_area <- gisland::read_sf_ftp("NS_IBTS_RF") %>% as("Spatial") species <- read_csv("ftp://ftp.hafro.is/pub/reiknid/einar/datras_worms.csv") fil <- dir("data-raw/datras", full.names = TRUE) # Setup list objects to temporarily store the results res_hh <- res_hl <- res_ca <- list() # Loop through each survey for(i in 1:length(fil)) { raw <- read_rds(fil[i]) sur <- raw$hh$Survey[1] %>% tolower() hh <- raw$hh %>% tidy_hh(all_variables = TRUE) %>% mutate(nsarea = gisland::geo_inside(shootlong, shootlat, ns_area, "AreaName") %>% as.integer(), faoarea = gisland::geo_inside(shootlong, shootlat, fao, "name"), # TODO: this should be part of the tidy-function rigging = as.character(rigging), stratum = as.character(stratum), stno = as.character(stno), hydrostno = as.character(hydrostno)) if(!is.null(raw$hl)) { hl <- raw$hl %>% tidy_hl(hh, species) } if(!is.null(raw$ca)) { ca <- raw$ca %>% tidy_ca(species) } hh %>% write_rds(paste0("data/", sur, "_hh.rds")) if(!is.null(raw$hl)) hl %>% write_rds(paste0("data/", sur, "_hl.rds")) if(!is.null(raw$ca)) ca %>% write_rds(paste0("data/", sur, "_ca.rds")) # temporary storage res_hh[[i]] <- hh res_hl[[i]] <- hl res_ca[[i]] <- ca } # Bind all the DATRAS data and save for later retrieval res_hh %>% bind_rows() %>% write_rds("data/hh_datras.rds") res_hl %>% bind_rows() %>% write_rds("data/hl_datras.rds") res_ca %>% bind_rows() %>% write_rds("data/ca_datras.rds")
In all its simplicity and with no extra frills the code to download, tidy and save one survey is as follows:
spe <- read_csv("ftp://ftp.hafro.is/pub/reiknid/einar/datras_worms.csv") sur <- "NS-IBTS" yrs <- 1965:2018 qts <- c(1, 3) hh <- getDATRAS(record = "HH", survey = sur, years = yrs, quarters = qts) %>% tidy_hh() getDATRAS(record = "HL", survey = sur, years = yrs, quarters = qts) %>% tidy_hl(hh, spe) %>% write_rds("data/ns-ibts_hl.rds") getDATRAS(record = "CA", survey = sur, years = yrs, quarters = qts) %>% tidy_ca(spe) %>% write_rds("data/ns-ibts_ca.rds") hh %>% write_rds("data/ns-ibts_hh.rds")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.