README.md

dispersalDiversity

This is a collection of R functions to facilitate the analysis of structure in biological communities. We originally developed these functions to describe and compare parental structure for dispersed seeds (Scofield et al. 2012 American Naturalist 180: 719-732), but their usefulness extents to biodiversity data or any other diversity-like data that can be expressed with this same data structure.

When genotypes are available for individuals under study, allelic diversity may be described and compared using functions having similar conceptual and statistical principles, as described in Sork et al. (in press).

NOTE: The version used for Scofield et al. (2012) has a different interface and lacks an R package structure. To use this version, get the release tagged 0.1, and simply source the files of interest.

These statistical tools were developed in collaboration with Peter Smouse (Rutgers University) and Victoria Sork (UCLA) and were funded in part by U.S. National Science Foundation awards NSF-DEB-0514956 and NSF-DEB-0516529.

The Mann-Whitney-Wilcoxon nested ranks test we originally provided here has been made into an R package and is available in its own repository. It can be installed via Github:

devtools::install_github("douglasgscofield/nestedRanksTest")
vignette("nestedRanksTest")

Installation

Install from github using devtools:

install.packages("devtools")
devtools::install_github("douglasgscofield/dispersalDiversity")
library(dispersalDiversity)

Prerequisites

This package requires the readGenalex package v1.0 available from CRAN, which is a prerequisite for this package and will be installed by default.

For producing plots, many find aesthetic colour choices to be available through palettes in the RColorBrewer package. membershipPlot (see below) will use this package if it is available. If it is not installed, get it from CRAN with:

install.package("RColorBrewer")

Input requirements

Most functions take as input a simple data structure, an object of class divtable which is a table of site (rows) by source (columns) counts, which each cell representing the number of times that source/species/type was observed at that site.

One method to create a random divtable:

n.sites <- 5
n.sources <- 10
n.samples <- 120
## data frame of site-source pairs
t <- data.frame(site = sample(n.sites, n.samples, replace = TRUE),
                source = round(runif(n.samples) * n.sources + 0.5))

## site-by-source matrix
m1 <- do.call(rbind, lapply(split(t, t$site), function(x) table(x$source)))
# this creates a class c("matrix")
m1 <- as.divtable(m1)

## or use xtabs
m2 <- xtabs(data = t)
## this creates a class c("xtabs","table")
m2 <- as.divtable(m2)

Membership plots

One of the first things you may want to do is produce a membership plot. Membership plots provide a visual representation of the site-by-source table. Note that the plot shown below does not match that produced by the example statements.

    data(granaries_2004_qlob)
    plot(granaries_2004_qlob)

membershipPlot output in black and white

If fill.method = "colour" and the library RColorBrewer is available, then it is used to pick the colours used for filling groups.

membershipPlot output in colour

Singleton sources (those that appear just once in just one site) are distinguished using a white background, while multiton sources (those that appear multiple times but still in just one site) can be distinguished with a gray background using the option distinguish.multiton = TRUE. Other options are provided for controlling labelling of the plot and producing output to PDF or PostScript files.

Both plots show that if the number of categories requiring colour/fill is sufficiently large, hatching is combined with colours. Both plots were produced with distinguish.multiton = FALSE.

Further published examples of membership plots produced by this code or earlier versions of this code can be seen Scofield et al. 2010, 2011, 2012.

Diversity statistics

The function diversity calculates PMI statistics (Grivet et al. 2005, Scofield et al. 2010, Scofield et al. 2011) and dispersal diversity statistics(Scofield et al. 2012) statistics (qgg, αg, etc.). For a divtable table holding site-by-species counts, the function calculates PMI and diversity statistics three different ways, each of which treats possible bias differently and each of which is returned in an otherwise identical sublist:

Several additional functions are provided.

Function | Description -------- | ----------- alphaDiversityTest(tab) | Test for differences in α diversity among sites within a single dataset alphaContrastTest(tab.a, tab.b) | Test whether there is a difference in the α diversity between two datasets alphaContrastTest3(tab.a, tab.b, tab.c) | Test whether there is a difference in the α diversity among three datasets pairwiseMeanTest(tab) | Test whether mean pairwise divergence/overlap among sites is different from the null espectation gammaContrastTest(tab.a, tab.b) | Test whether there is a difference in the γ diversity between two datasets gammaContrastTest3(tab.a, tab.b, tab.c) | Test whether there is a difference in the γ diversity among three datasets gammaAccum(tab) | Permute species among sites and accumulate gamma diversity across permutations

plot() methods are provided for the results of the above tests. For example, using plot() on the result of alphaDiversityTest() might look like

plotAlphaTest example

Using plot() on the result of pairwiseMeanTest() might look like:

plotPairwiseMeanTest example

plotPairwiseMatrix()

Provides a function for plotting pairwise diversity matrices as returned by the diversity() function, examples of which can be seen in Figure 4A-C of Scofield et al. Am Nat.

plotPairwiseMatrix() : Create a visual plot of pairwise divergence or overlap values as calculated by pmiDiversity()

For example, with tab defined as above, plot the divergence matrix based on rgg calculations, labelling the axes "Seed Pool", using the following code:

d <- diversity(tab)
plotPairwiseMatrix(d$r$divergence.mat,
                   pairwise.mean = d$r$divergence,
                   statistic = "divergence",
                   axis.label = "Seed Pool")

plotPairwiseMatrix example

Gamma accumulation

dispersalDiversity provides functions for calculating γ accumulation across sites, and plotting the result, examples of which can be seen in Figure 4D-F of Scofield et al. Am Nat.

A typical workflow using these functions would be:

rga.result <- gammaAccum(tab)
plot(rga.result)

gammaAccum(tab)

Perform a γ diversity accumulation on the site-by-source data in tab. The result is returned in a list, which may be passed to plot to plot the result. Several arguments control the method of accumulation and value of γ calculated. Only the defaults have been tested; the others were developed while exploring the data and must be considered experimental.

tab : Site-by-source table, same format as that passed to pmiDiversity()

gamma.method : Calculate γ using "r" (default), "q.nielsen" or "q" method (see paper)

resample.method : "permute" (default) or "bootstrap"; whether to resample sites without ("permute") or with ("bootstrap") replacement

accum.method : "random" (default) or "proximity". If proximity is used, then distance.file must be supplied

distance.file : A file or data.frame containing three columns of data, with the header/column names being pool, X, and Y, containing the spatial locations of the seed pools named in the row names of tab; only used with accum.method="proximity"

plot(rga.result)

Create a visual plot of γ accumulation results from gammaAccum().

plotGammaAccum example

Additional functions

The following functions typically won't be used separately, use runGammaAccum() instead.

gammaAccum() : Workhorse function for γ accumulation

gammaAccumStats() : Extracts stats from the result of gammaAccum()

runGammaAccumSimple() : Wrapper that runs and then returns stats from gammaAccum()

Allelic Diversity

These functions calculate allelic alpha, beta and gamma diversity as described by Sork et al. (unpublished), following the same conceptual and statistical principles described in Scofield et al. (2012). These have been used to calculate the structure of allelic diversity for complete progeny genotypes as well as their decomposed male and female gametes, and to compare alpha and gamma diversity of progeny dispersed relatively short vs. long distances.

Input requirements

Input begins as a file of genotypes in GenAlEx format, which is read using the [readGenalex] R package (https://github.com/douglasgscofield/readGenalex) available via Github:

devtools::install_github("douglasgscofield/readGenalex")

Usage

The augmented data.frame returned by readGenalex() is then processed into a list of tables, one per locus, using allele.createTableList(). If the true ploidy of the input data does not the ploidy of the GenAlEx file, for example if haploid gametes are represented by a pair of homozygous alleles, then use the new.ploidy=1 argument to allele.createTableList() to reduce the ploidy to its true level.

The list of tables is analysed as a unit by allele.pmiDiversity(), and can be passed to one of the contrast functions for testing against another list of tables.

Workflow

The workflow to calculate basic allelic diversity statistics:

dat <- readGenalex("GenAlEx-format-file-of-genotypes.txt")
gt <- createAlleleTables(dat)
div <- diversityMultilocus(gt)

For comparing allele diversity between two different samples:

dat1 <- readGenalex("file-of-genotypes-sample-1.txt")
dat2 <- readGenalex("file-of-genotypes-sample-2.txt")
gt1 <- createAlleleTables(dat1)
gt2 <- createAlleleTables(dat2)
alpha.contrast <- alphaContrastTest(gt1, gt2)
gamma.contrast <- gammaContrastTest(gt1, gt2)

For calculating and plotting gamma accumulation curves across all loci:

dat <- readGenalex("genotypes.txt")
lst <- createAlleleTables(dat)
allele.rga.result <- gammaAccum(lst)
plot(allele.rga.result)

Functions

diversityMultilocus() : The function calculating diversity for a set of loci. The single argument is a list produced by createAlleleTables(), and it uses the function diversitySingleLocus().

createAlleleTables() : Take genotypes read by readGenalex(), produce a list of allele count tables used by the other functions. Each entry of the list is, for each locus, a table of site x allele counts, with row names being the site names, and column names being the names given to the individual alleles.

diversitySingleLocus() : The single argument is, for a single locus, a table of site × allele counts, with row names being the site names, and column names being the names given to the individual alleles.

alphaDiversityTest(lst) : Test whether there is a difference in the alpha diversity among patches in an allele diversity dataset, that is, whether β = 1 or δ = 0 across a collection of patches at a site (see Sork et al.).

alphaContrastTest(lst.a, lst.b) : Test whether there is a difference in the alpha diversity between two lists of allele diversity datasets.

gammaContrastTest(lst.a, lst.b) : Test whether there is a difference in the gamma diversity between two allele diversity datasets.

gammaAccum(lst) : Perform a gamma diversity accumulation on the site-by-source data in tab. Several arguments control the method of accumulation and value of gamma calculated. Other arguments are identical to gammaAccum(). Only the defaults have been tested; the others were developed while exploring the data and must be considered experimental. The result is returned in a custom class which may be passed to plot() to plot the result.

We also provide the datasets granaries_2002_Qlob and granaries_2004_Qlob, which include assignments of Quercus lobata acorns harvested from acorn woodpecker granaries in 2002 and 2004 to seed source trees.

References

Grivet D, Smouse PE, Sork VL. 2005. A novel approach to an old problem: tracking dispersed seeds. Molecular Ecology 14: 3585-3595.

Nielsen R, Tarpy DR, Reeve HK. 2003. Estimating effective paternity number in social insects and the effective number of alleles in a population. Molecular Ecology 12: 3157-3164.

Scofield DG, Smouse PE, Karubian J, Sork VL. 2012. Use of alpha, beta, and gamma diversity measures to characterize seed dispersal by animals. American Naturalist 180: 719-732, supplement, data.

Scofield DG, Alfaro VR, Sork VL, Grivet D, Martinez E, Papp J, Pluess AR, Koenig WD, Smouse PE. 2011. Foraging patterns of acorn woodpeckers (Melanerpes formicivorus) on valley oak (Quercus lobata Née) in two California oak savanna-woodlands. Oecologia 166: 187-196, supplement.

Scofield DG, Sork VL, Smouse PE. 2010. Influence of acorn woodpecker social behaviour on transport of coast live oak (Quercus agrifolia) acorns in a southern California oak savanna. Journal of Ecology 98: 561-572, supplement.

Sork VL, Smouse PE, Grivet D, Scofield DG. (In press) Impact of asymmetric male and female gamete dispersal on allelic diversity and spatial genetic structure in valley oak (Quercus lobata Née). Evolutionary Ecology.



douglasgscofield/dispersalDiversity documentation built on March 30, 2021, 9:50 a.m.