Functions to access marine data via Web Services

An increasing number of marine data portals are making web services available. Alongside traditional downloads from the browser, these allow accessing data programmatically directly from scripting softwares such as Rstudio.

The main advantages are two-folds: 1) the data fetched by R scripts are always up to date, a feature particularly useful when using data portals that are regularly updated; 2) fetching, formatting and cleaning steps can be done as part of a single seamless workflow.

merpWS is a collection of R functions making use of available marine web services to retrieve data from the main marine data portals, including CEFAS, EMODNet and the PML Trait Explorer.


merpWS can be installed using the install_github function from the devtools package.

devtools::install_github(repo = "MarineEcosystemResearchProgramme/merpWS")


Below is a possible workflow making use of merpWS functions. It exemplifies how data from different data sources can be combined to create added value. Typically we combine an observed variable with one or several explanatory ones. Note that web services are in most cases recent additions to data portals and bugs are not uncommon. We are working in collaboration with data holding institutes to maintain merpWS.

Cefas data

CEFAS has recently made its API available. It allows exploring and downloading some the data described in its new Data Hub.

# list all data files
mydatasets <- get_cefas_datasets()

# extract csv files (only downloadable format)
mydownloads <- mydatasets$to_download
mydata <- mydownloads[explore_cefas_data(mydownloads$Name, keyword = "SWT"), ]

CEFAS temperature data from different projects are available. They can be downloaded one by one and aggregated. Below is an example for one of the data sets.

temp_data <- download_cefas_data(recordset_id = mydata$Id[1])

We aggregate 10 CEFAS data sets together to obtain the data.frame temp_data. The Dates are reformatted to create year, month and day columns.

# extract date in year, month and day from "Time"
inter <- strsplit(as.character(temp_data$Time), split = " ") %>%
  sapply(., function(x) x[[1]]) %>%
  strsplit(., split = "/") 
temp_data$day <- as.numeric(sapply(inter, function(x) x[[1]]))
temp_data$month <- as.numeric(sapply(inter, function(x) x[[2]]))
temp_data$year <- as.numeric(sapply(inter, function(x) x[[3]]))
# plot temperature record locations. 
mymap <- get_map(location = c(lon = -2.5, lat = 53.5), zoom = 5, maptype = "satellite")
ggmap(mymap) + geom_point(aes(x = Long, y = Lat), size = 0.5, data = temp_data)

# get some biotic data from obis for two species of barnacles with different thermal affinities: 
# the southern species Balanus stellatus and the northern species Semibalanus balanoides. 

uk_eez <- mr_shp("MarineRegions:eez", maxFeatures = NULL, filter = "United Kingdom Exclusive Economic Zone")
uk_eez <- SpatialPolygonsDataFrame(gSimplify(uk_eez, tol = 0.01, topologyPreserve = TRUE), 
data = uk_eez@data)

stellatus <- plot_obis("Chthamalus stellatus", shapefile = uk_eez, myresolution = 0.5, 
                     myzoom = 5, gridded = T)$obis_data
stellatus <- format_obis_date(stellatus)
stellatus <- stellatus[!$eventDate), ]

balanoides <- plot_obis("Semibalanus balanoides", shapefile = uk_eez, myresolution = 0.5, 
                     myzoom = 5, gridded = T)$obis_data
balanoides <- format_obis_date(balanoides)
balanoides <- balanoides[!$eventDate), ]
# draw realized thermal niche by matching occurrences with CEFAS temperature data. This can be a slow process
# that may take a few minutes depending on the size of both data sets to match.
stellatus_temp <- rep(NA, nrow(stellatus))
for(i in 1:length(stellatus_temp)) {
  stellatus_temp[i] <- match_env(env_lon = temp_data$Long, env_lat = temp_data$Lat, env_time = list(year = temp_data$year, month =     temp_data$month), bio_lon = stellatus$decimalLongitude[i], bio_lat = stellatus$decimalLatitude[i], bio_time = list(year = stellatus$year[i], month = stellatus$month[i]), max_dist = 100000, env_variable = temp_data$tC)

balanoides_temp <- rep(NA, nrow(balanoides))
for(i in 1:length(balanoides_temp)) {
  balanoides_temp[i] <- match_env(env_lon = temp_data$Long, env_lat = temp_data$Lat, env_time = list(year = temp_data$year, month =     temp_data$month), bio_lon = balanoides$decimalLongitude[i], bio_lat = balanoides$decimalLatitude[i], bio_time = list(year = balanoides$year[i], month = balanoides$month[i]), max_dist = 100000, env_variable = temp_data$tC)

mydf <- data.frame(temp = c(stellatus_temp, balanoides_temp), 
                   species = c(rep("stellatus", length(stellatus_temp)), rep("balanoides", length(balanoides_temp))))
ggplot(mydf) + geom_density(aes(temp, col = species))


merpWS functions are being developed openly on github.

MarineEcosystemResearchProgramme/merpWS documentation built on May 7, 2019, 2:51 p.m.