# genotypeDiversity: Genotype Diversity Statistics In polysat: Tools for Polyploid Microsatellite Analysis

## Description

genotypeDiversity calculates diversity statistics based on genotype frequencies, using a distance matrix to assign individuals to genotypes. The Shannon and Simpson functions are also available to calculate these statistics directly from a vector of frequencies.

## Usage

  1 2 3 4 5 6 7 8 9 10 11 12 genotypeDiversity(genobject, samples = Samples(genobject), loci = Loci(genobject), d = meandistance.matrix(genobject, samples, loci, all.distances = TRUE, distmetric = Lynch.distance), threshold = 0, index = Shannon, ...) Shannon(p, base = exp(1)) Simpson(p) Simpson.var(p) 

## Arguments

 genobject An object of the class "genambig" (or more generally, "gendata" if a value is supplied to d). If there is more than one population, the PopInfo slot should be filled in. genobject is the dataset to be analyzed, although the genotypes themselves will not be used if d has already been calculated. Missing genotypes, however, will indicate individuals that should be skipped in the analysis. samples An optional character vector indicating a subset of samples to analyze. loci An optional character vector indicating a subset of loci to analyze. d A list such as that produced by meandistance.matrix or meandistance.matrix2 when all.distances = TRUE. The first item in the list is a three dimensional array, with the first dimension indexed by locus and the second and third dimensions indexed by sample. These are genetic distances between samples, by locus. The second item in the list is the distance matrix averaged across loci. This mean matrix will be used only if all loci are being analyzed. If loci is a subset of the loci found in d, the mean matrix will be recalculated. threshold The maximum genetic distance between two samples that can be considered to be the same genotype. index The diversity index to calculate. This should be Shannon, Simpson, or a user-defined function that takes as its first argument a vector of frequencies that sum to one. ... Additional arguments to pass to index, for example the base argument for Shannon. p A vector of counts. base The base of the logarithm for calculating the Shannon index. This is exp(1) for the natural log, or 2 for log base 2.

## Details

genotypeDiversity runs assignClones on distance matrices for individual loci and then for all loci, for each seperate population. The results of assignClones are used to calculate a vector of genotype frequencies, which is passed to index.

Shannon calculates the Shannon index, which is:

-∑ p_i/N ln(p_i/N)

(or log base 2 or any other base, using the base argument) given a vector p of genotype counts, where N is the sum of those counts.

Simpson calculates the Simpson index, which is:

∑ p_i(p_i - 1)/(N(N -1))

Simpson.var calculates the variance of the Simpson index:

\frac{4N(N-1)(N-2)∑ p_{i}^3 + 2N(N-1)∑ p_{i}^2 - 2N(N-1)(2N-3)(∑ p_{i}^2)^2}{[N(N-1)]^2}

The variance of the Simpson index can be used to calculate a confidence interval, for example the results of Simpson plus or minus twice the square root of the results of Simpson.var would be the 95% confidence interval.

## Value

A matrix of diversity index results, with populations in rows and loci in columns. The final column is called "overall" and gives the results when all loci are analyzed together.

Lindsay V. Clark

## References

Shannon, C. E. (1948) A mathematical theory of communication. Bell System Technical Journal 27:379–423 and 623–656.

Simpson, E. H. (1949) Measurement of diversity. Nature 163:688.

Lowe, A., Harris, S. and Ashton, P. (2004) Ecological Genetics: Design, Analysis, and Application. Wiley-Blackwell.

Arnaud-Haond, S., Duarte, M., Alberto, F. and Serrao, E. A. (2007) Standardizing methods to address clonality in population studies. Molecular Ecology 16:5115–5139.

assignClones, alleleDiversity
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 # set up dataset mydata <- new("genambig", samples=c("a","b","c"), loci=c("F","G")) Genotypes(mydata, loci="F") <- list(c(115,118,124),c(115,118,124), c(121,124)) Genotypes(mydata, loci="G") <- list(c(162,170,174),c(170,172), c(166,180,182)) Usatnts(mydata) <- c(3,2) # get genetic distances mydist <- meandistance.matrix(mydata, all.distances=TRUE) # calculate diversity under various conditions genotypeDiversity(mydata, d=mydist) genotypeDiversity(mydata, d=mydist, base=2) genotypeDiversity(mydata, d=mydist, threshold=0.3) genotypeDiversity(mydata, d=mydist, index=Simpson) genotypeDiversity(mydata, d=mydist, index=Simpson.var)