Prerequisites

library(rgeogendiv)

BOLD dataset

BOLD (Barcode Of Life Database) is a database of Barcode DNA sequences of georeferenced specimen that closely approximate species.

We use the package bold to download a set of georeferenced sequences for the Pomacanthidae taxon order request.

library(bold)
taxonRequest <- "Pomacanthidae"
resBold <- bold::bold_seqspec(taxon=taxonRequest, sepfasta=TRUE)

Prepare dataset

We filter and mutate georeferenced sequence dataset from boldsystems.org in order to produce a curated dataframe with rows as individual specimen and columns as specimen information. We add a new column sequence with DNA sequences as string.

The function prepare_bold_res apply 5 filters :

## filter and mutate
preResBold <- prepare_bold_res(resBold,
                                   marker_code="COI-5P",
                                   species_names=TRUE, 
                                   coordinates=TRUE, 
                                   ambiguities=TRUE, 
                                   min_length=420,
                                   max_length=720
                                  )

Build grid world map

The grid is composed of nested squares of siteSize meters that we call site. By default, the grid is built on a worldmap in Behrmann projection. In this example we set a grid with sites with a diameter of 260 kilometers.

grid.sp <- grid_spatialpolygons(siteSize=260000)

Generate the matrix of presence/absence of a specimen in sites from the worldmap grid

specimenIntersectSites <- specimen_intersect_site(specimen.df=preResBold, grid.sp=grid.sp)

Nucleotide diversity

By species

We gather together specimen from the same species located within the same site of the grid. Then sequences are aligned and nucleotide diversity is calculated for each species within each site.

nucdivSpecies <- nucleotide_diversity_species(specimen.df=preResBold, 
                             sequenceIntersectSites=specimenIntersectSites,
                             MinimumNumberOfSequencesBySpecies=3
                             ) 

By sites

Once we got species nucleotide diversity, we calculate mean species nucleotide diversity by site of the worldmap grid.

nucdivSites <- nucleotide_diversity_sites(nucdivSpecies)

Worldmap grid of mean species nucleotide diversity

We assign a mean species nucleotide diversity value to each site in the worldmap grid.

nucdivGrid <- nucleotide_diversity_grid(nucdivSites, grid.sp)

Then, we can print the wordldmap grid of nucleotide diversity.

gg <- plot_grid(nucdivGrid)
gg


Grelot/rgeogendiv documentation built on Dec. 22, 2020, 5:51 a.m.