Produce a basic summary table for population genetic analyses.
Description
For the poppr package description, please see
package?poppr
This function allows the user to quickly view indices of heterozygosity,
evenness, and linkage to aid in the decision of a path to further analyze
a specified dataset. It natively takes genind
and
genclone
objects, but can convert any raw data formats
that adegenet can take (fstat, structure, genetix, and genpop) as well as
genalex files exported into a csv format (see read.genalex
for
details).
Usage
1 2 3 4 
Arguments
dat 
a 
total 
When 
sublist 
a list of character strings or integers to indicate specific
population names (accessed via 
blacklist 
a list of character strings or integers to indicate specific populations to be removed from analysis. Defaults to NULL. 
sample 
an integer indicating the number of permutations desired to
obtain pvalues. Sampling will shuffle genotypes at each locus to simulate
a panmictic population using the observed genotypes. Calculating the
pvalue includes the observed statistics, so set your sample number to one
off for a round pvalue (eg. 
method 
an integer from 1 to 4 indicating the method of sampling
desired. see 
missing 
how should missing data be treated? 
cutoff 

quiet 

clonecorrect 
default 
strata 
a 
keep 
an 
plot 

hist 

index 

minsamp 
an 
legend 

... 
arguments to be passed on to 
Details
This table is intended to be a first look into the dynamics of
mutlilocus genotype diversity. Many of the statistics (except for the the
index of association) are simply based on counts of multilocus genotypes
and do not take into account the actual allelic states.
Descriptions of the statistics can be found in the Algorithms and
Equations vignette: vignette("algo", package = "poppr")
.
sampling
The sampling procedure is explicitly for testing the
index of association. None of the other diversity statistics (H, G, lambda,
E.5) are tested with this sampling due to the differing data types. To
obtain confidence intervals for these statistics, please see
diversity_ci
.
rarefaction
Rarefaction analysis is performed on the number of
multilocus genotypes because it is relatively easy to estimate (Grünwald et
al., 2003). To obtain rarefied estimates of diversity, it is possible to
use diversity_ci
with the argument rarefy = TRUE
graphic
This function outputs a ggplot2 graphic of
histograms. These can be manipulated to be visualized in another manner by
retrieving the plot with the last_plot
command from
ggplot2. A useful manipulation would be to arrange the graphs into a
single column so that the values of the statistic line up:
p <
last_plot(); p + facet_wrap(~population, ncol = 1, scales = "free_y")
The name for the groupings is "population" and the name for the x axis is
"value".
Value
A data frame with populations in rows and the following columns:
Pop 
A vector indicating the population factor 
N 
An integer vector indicating the number of individuals/isolates in the specified population. 
MLG 
An integer vector indicating the number of multilocus genotypes
found in the specified population, (see: 
eMLG 
The expected number of MLG at the lowest common sample size
(set by the parameter 
SE 
The standard error for the rarefaction analysis 
H 
ShannonWeiner Diversity index 
G 
Stoddard and Taylor's Index 
lambda 
Simpson's index 
E.5 
Evenness 
Hexp 
Nei's gene diversity (expected heterozygosity) 
Ia 
A numeric vector giving the value of the Index of Association for
each population factor, (see 
p.Ia 
A numeric vector indicating the pvalue for Ia from the number
of reshufflings indicated in 
rbarD 
A numeric vector giving the value of the Standardized Index of
Association for each population factor, (see 
p.rD 
A numeric vector indicating the pvalue for rbarD from the
number of reshuffles indicated in 
File 
A vector indicating the name of the original data file. 
Note
The calculation of Hexp
has changed from poppr 1.x. It was
previously calculated based on the diversity of multilocus genotypes,
resulting in a value of 1 for sexual populations. This was obviously not
Nei's 1978 expected heterozygosity. We have thus changed the statistic to
be the true value of Hexp by calculating (n/(n  1))*(1  sum(p^2)) where p is the allele
frequencies at a given locus and n is the number of observed alleles (Nei,
1978) in each locus and then returning the average. Caution should be
exercised in interpreting the results of Hexp with polyploid organisms with
ambiguous ploidy. The lack of allelic dosage information will cause rare
alleles to be overrepresented and artificially inflate the index. This is
especially true with small sample sizes.
Author(s)
Zhian N. Kamvar
References
PaulMichael Agapow and Austin Burt. Indices of multilocus linkage disequilibrium. Molecular Ecology Notes, 1(12):101102, 2001
A.H.D. Brown, M.W. Feldman, and E. Nevo. Multilocus structure of natural populations of Hordeum spontaneum. Genetics, 96(2):523536, 1980.
Niklaus J. Gr\"unwald, Stephen B. Goodwin, Michael G. Milgroom, and William E. Fry. Analysis of genotypic diversity data for populations of microorganisms. Phytopathology, 93(6):73846, 2003
Bernhard Haubold and Richard R. Hudson. Lian 3.0: detecting linkage disequilibrium in multilocus data. Bioinformatics, 16(9):847849, 2000.
Kenneth L.Jr. Heck, Gerald van Belle, and Daniel Simberloff. Explicit calculation of the rarefaction diversity measurement and the determination of sufficient sample size. Ecology, 56(6):pp. 14591461, 1975
Masatoshi Nei. Estimation of average heterozygosity and genetic distance from a small number of individuals. Genetics, 89(3):583590, 1978.
S H Hurlbert. The nonconcept of species diversity: a critique and alternative parameters. Ecology, 52(4):577586, 1971.
J.A. Ludwig and J.F. Reynolds. Statistical Ecology. A Primer on Methods and Computing. New York USA: John Wiley and Sons, 1988.
Simpson, E. H. Measurement of diversity. Nature 163: 688, 1949 doi:10.1038/163688a0
Good, I. J. (1953). On the Population Frequency of Species and the Estimation of Population Parameters. Biometrika 40(3/4): 237264.
Lande, R. (1996). Statistics and partitioning of species diversity, and similarity among multiple communities. Oikos 76: 513.
Jari Oksanen, F. Guillaume Blanchet, Roeland Kindt, Pierre Legendre, Peter R. Minchin, R. B. O'Hara, Gavin L. Simpson, Peter Solymos, M. Henry H. Stevens, and Helene Wagner. vegan: Community Ecology Package, 2012. R package version 2.05.
E.C. Pielou. Ecological Diversity. Wiley, 1975.
Claude Elwood Shannon. A mathematical theory of communication. Bell Systems Technical Journal, 27:379423,623656, 1948
J M Smith, N H Smith, M O'Rourke, and B G Spratt. How clonal are bacteria? Proceedings of the National Academy of Sciences, 90(10):43844388, 1993.
J.A. Stoddart and J.F. Taylor. Genotypic diversity: estimation and prediction in samples. Genetics, 118(4):70511, 1988.
See Also
clonecorrect
,
poppr.all
,
ia
,
missingno
,
mlg
,
diversity_stats
,
diversity_ci
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46  data(nancycats)
poppr(nancycats)
## Not run:
# Sampling
poppr(nancycats, sample = 999, total = FALSE, plot = TRUE)
# Customizing the plot
library("ggplot2")
p < last_plot()
p + facet_wrap(~population, scales = "free_y", ncol = 1)
# Turning off diversity statistics (see get_stats)
poppr(nancycats, total=FALSE, H = FALSE, G = FALSE, lambda = FALSE, E5 = FALSE)
# The previous version of poppr contained a definition of Hexp, which
# was calculated as (N/(N  1))*lambda. It basically looks like an unbiased
# Simpson's index. This statistic was originally included in poppr because it
# was originally included in the program multilocus. It was finally figured
# to be an unbiased Simpson's diversity metric (Lande, 1996; Good, 1953).
data(Aeut)
uSimp < function(x){
lambda < vegan::diversity(x, "simpson")
x < drop(as.matrix(x))
if (length(dim(x)) > 1){
N < rowSums(x)
} else {
N < sum(x)
}
return((N/(N1))*lambda)
}
poppr(Aeut, uSimp = uSimp)
# Demonstration with viral data
# Note: this is a larger data set that could take a couple of minutes to run
# on slower computers.
data(H3N2)
strata(H3N2) < data.frame(other(H3N2)$x)
setPop(H3N2) < ~country
poppr(H3N2, total = FALSE, sublist=c("Austria", "China", "USA"),
clonecorrect = TRUE, strata = ~country/year)
## End(Not run)
