We use geogendivr and geogendivrdata to perform this analysis.
#if (!requireNamespace("devtools", quietly = TRUE)) { install.packages("devtools") } #library(devtools) #devtools::install_github("Grelot/geogendivr")
library(geogendivr)
BOLD (Barcode Of Life Database) is a database of Barcode DNA sequences of georeferenced specimen that closely approximate species.
geogendivr provides a sample of a BOLD request for the taxon "Pomacanthidae" as a dataset. We use this dataset as an example to test functions of the package geogendivr.
First we need to load the resBold dataset.
##taxonRequest <- "Actinopterygii" ##resBold <- bold_seqspec(taxon=taxonRequest, sepfasta=TRUE) ## load taxon request "Pomacanthidae" sample from BOLD data(requestPomacanthidaeBOLD)
resBold is a list of objects returned by bold_seqspec command from bold package. They are 2 objects:
dataframe of specimen information (spatial coordinates, taxonomy...) list of DNA barcode sequences. Each row is related to an individual sequences. They are 725 published records, with 725 records with sequences, forming 71 BINs (clusters), with specimens from 46 countries, deposited in 27 institutions.
## rows is the number of individuals, columns the number of information descriptors dim(resBold$data) ## number of barcode sequences length(resBold$fasta) ## names of the information fields names(resBold$data)
For the next steps, most important fields are species_name
, lat
, lon
and marker_codes
Reef Life Survey is a set of size and abundance data from thousands of reef-dwelling species recorded on RLS transects across over thousands of sites worldwide
geogendivr provides two reef life survey dataframes:
reefishSurveySpecies
describes reef fish species abundance and biomass for 7100 spatial survey geographical position
reefishSurveyEnvSocio
social environmental data attributed to Reef Life Survey geographical locations.
## load species Reef Life Survey dataframe data(reefishSurveySpecies) ## a thorough description of this dataframe is available #help(reefishSurveySpecies)
## load social environmental Reef Life Survey dataframe data(reefishSurveyEnvSocio) ## a thorough description of this dataframe is available #help(reefishSurveyEnvSocio)
plot_reefish_survey(reefishSurveyEnvSocio, 500)
Here we visualize geographical distribution of Reef Life Survey points clusters (within 500km distance).
Filter and mutate BOLD dataset to produce a curated dataframe with rows as individual specimen and columns as specimen information. It adds a new column sequence with fasta sequences as string.
The function prepare_bold_res
apply 5 filters :
marker_code
species_name
informationlat
or lon
coordinates information## filter and mutate prparedResBold <- prepare_bold_res(resBold, marker_code="COI-5P", species_names=TRUE, coordinates=TRUE, ambiguities=TRUE, min_length=420, max_length=720 )
As we work on fishes and later with Reef Life Survey dataset, we search for synonyms into fishbase to validate species names from the BOLD dataset.
The function fishbase_name_species_bold
checks species_name
field and seek for fishbase
synonyms. Then it adds a new field fishbase_species_name
.
## validate species names prparedResBold.fishbaseValid <- fishbase_name_species_bold(prparedResBold)
As we work on Reef Life Survey dataset, we want to keep only species which are described in Reef Life Survey. The function select_reefish_species
:
reefishBold <- select_reefish_species(prparedResBold.fishbaseValid, reefishSurveySpecies, countSequencesbySpeciesThreshold=2 )
To work with spatial environmental data from Reef Life Survey, we transform our BOLD dataframe into spatialpoints object with the right projection.
reefishBold.sp <- spatialpoints_bold(reefishBold, projectionCRS="+init=epsg:3347")
We calculate a buffer of 250km around each RLS Survey point. We generate a matrix of presence/absence of each BOLD georeferenced sequence within a buffer of an RLS Survey.
boldWithinRLS <- sequences_within_buffer(latitude=reefishSurveyEnvSocio$SiteLatitude, longitude=reefishSurveyEnvSocio$SiteLongitude, boldSp=reefishBold.sp, bufferDistance=250, projection="+init=epsg:3347" )
We gather together sequences from the same species located within the same RLS Survey geographical buffer. Then sequences are aligned and nucleotide diversity is calculated for each species within each RLS Survey geographical buffer.
nucdivSpecies <- species_nucleotide_diversity(infobold=reefishBold, sequenceWithinBuffer=boldWithinRLS, MinimumNumberOfSequencesBySpecies=3 )
nucdivSpeciesSurveyInfo <- merge_info_nucdiv(nucdivSpecies, reefishSurveyEnvSocio, reefishSurveySpecies)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.