In Grelot/rgeogendiv: Geographic Genetic Diversity

Prerequisites

library(rgeogendiv)

BOLD dataset

BOLD (Barcode Of Life Database) is a database of Barcode DNA sequences of georeferenced specimen that closely approximate species.

We use the package bold to download a set of georeferenced sequences for the Pomacanthidae taxon order request.

library(bold)
taxonRequest <- "Pomacanthidae"
resBold <- bold::bold_seqspec(taxon=taxonRequest, sepfasta=TRUE)

Prepare dataset

We filter and mutate georeferenced sequence dataset from boldsystems.org in order to produce a curated dataframe with rows as individual specimen and columns as specimen information. We add a new column sequence with DNA sequences as string.

The function prepare_bold_res apply 5 filters :

Select specimen with given marker_code
Remove specimen with no species_name information
Remove specimen with no lat or lon coordinates information
Remove specimen with IUAPC ambiguities on DNA sequences
Select specimen with DNA sequences within a given range of lengths in bp

## filter and mutate
preResBold <- prepare_bold_res(resBold,
                                   marker_code="COI-5P",
                                   species_names=TRUE, 
                                   coordinates=TRUE, 
                                   ambiguities=TRUE, 
                                   min_length=420,
                                   max_length=720
                                  )

Build grid world map

The grid is composed of nested squares of siteSize meters that we call site. By default, the grid is built on a worldmap in Behrmann projection. In this example we set a grid with sites with a diameter of 260 kilometers.

grid.sp <- grid_spatialpolygons(siteSize=260000)

Generate the matrix of presence/absence of a specimen in sites from the worldmap grid

specimenIntersectSites <- specimen_intersect_site(specimen.df=preResBold, grid.sp=grid.sp)

Nucleotide diversity

By species

We gather together specimen from the same species located within the same site of the grid. Then sequences are aligned and nucleotide diversity is calculated for each species within each site.

nucdivSpecies <- nucleotide_diversity_species(specimen.df=preResBold, 
                             sequenceIntersectSites=specimenIntersectSites,
                             MinimumNumberOfSequencesBySpecies=3
                             )

By sites

Once we got species nucleotide diversity, we calculate mean species nucleotide diversity by site of the worldmap grid.

nucdivSites <- nucleotide_diversity_sites(nucdivSpecies)

Worldmap grid of mean species nucleotide diversity

We assign a mean species nucleotide diversity value to each site in the worldmap grid.

nucdivGrid <- nucleotide_diversity_grid(nucdivSites, grid.sp)

Then, we can print the wordldmap grid of nucleotide diversity.

gg <- plot_grid(nucdivGrid)
gg

Grelot/rgeogendiv documentation built on Dec. 22, 2020, 5:51 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Grelot/rgeogendiv
Geographic Genetic Diversity

In Grelot/rgeogendiv: Geographic Genetic Diversity

Prerequisites

BOLD dataset

Prepare dataset

Build grid world map

Generate the matrix of presence/absence of a specimen in sites from the worldmap grid

Nucleotide diversity

By species

By sites

Worldmap grid of mean species nucleotide diversity

R Package Documentation

Browse R Packages

We want your feedback!

Grelot/rgeogendiv Geographic Genetic Diversity

In Grelot/rgeogendiv: Geographic Genetic Diversity

Prerequisites

BOLD dataset

Prepare dataset

Build grid world map

Generate the matrix of presence/absence of a specimen in sites from the worldmap grid

Nucleotide diversity

By species

By sites

Worldmap grid of mean species nucleotide diversity

R Package Documentation

Browse R Packages

We want your feedback!

Grelot/rgeogendiv
Geographic Genetic Diversity