knitr::opts_chunk$set(echo = TRUE)
#devtools::load_all()
library(Rocc)
library(knitr)
library(dplyr)

0. Installing and loading package

remotes::install_github("liibre/Rocc")
library(dplyr)
library(Rocc)

1. Download and bind data from different sources

Here, we have a short list of two fern species.

species_search <- c("Asplenium truncorum", "Lindsaea lancea")

Here, we are downloading data from two species of ferns.

Species Link

data_splink <- list()
for (sp in species_search) {
  data_splink[[sp]] <- rspeciesLink(species = sp, 
                              filename = paste0(gsub(" ", "_", sp), "_splink"))
}

df_splink <- bind_rows(data_splink, .id = "species_search") 
dim(df_splink)
unique(df_splink$species_search)

GBIF

data_gbif <- list()

for (sp in species_search) {
  data_gbif[[sp]] <- rgbif2(species = sp, 
                      filename = paste0(gsub(" ", "_", sp), "_gbif"))
}

names(data_gbif) <- species_search
df_gbif <- bind_rows(data_gbif, .id = "species_search")

2. Binding data from different sources

df <- bind_dwc(splink_data = df_splink, gbif_data = df_gbif)

3. Check string in species name

Given that the data base might come from source with errors, we perform a basic check on the string of a species name. We will select only unique entries in species names.

# Vector of unique entries in species names
species_name_raw <- unique(df$scientificName)

For the unique entries, we will perform a basic check on the string.

species_name_check  <- check_string(species_name_raw)
species_name_check

Here, we are interested only in the names assigned with possibly_ok and name_w_authors. Now we will filter the occurrence data within these categories.

verbatimSpecies_ok <- species_name_check$verbatimSpecies[species_name_check$speciesStatus %in% c("possibly_ok", "name_w_authors")]
df_ok <- df[df$scientificName %in% verbatimSpecies_ok, ]

In this cleaning we went from a total of r nrow(df) occurrences to r nrow(df_ok) occurrences.

Finally, we can write the resultant occurrence data on disk.

write.csv(df_ok, "results/occurrence_data.csv", row.names = FALSE)


saramortara/rocc documentation built on April 3, 2022, 3:41 p.m.