poppr.amova: Perform Analysis of Molecular Variance (AMOVA) on genind or...

Description Usage Arguments Details Value References See Also Examples

View source: R/amova.r


This function simplifies the process necessary for performing AMOVA in R. It gives user the choice of utilizing either the ade4 or the pegas implementation of AMOVA. See ade4::amova() (ade4) and pegas::amova() (pegas) for details on the specific implementation.


poppr.amova(x, hier = NULL, clonecorrect = FALSE, within = TRUE,
  dist = NULL, squared = TRUE, freq = TRUE, correction = "quasieuclid",
  sep = "_", filter = FALSE, threshold = 0,
  algorithm = "farthest_neighbor", missing = "loci", cutoff = 0.05,
  quiet = FALSE, method = c("ade4", "pegas"), nperm = 0)



a genind or genclone object


a hierarchical formula that defines your population hierarchy. (e.g.: ~Population/Subpopulation). See Details below.


logical if TRUE, the data set will be clone corrected with respect to the lowest level of the hierarchy. The default is set to FALSE. See clonecorrect() for details.


logical. When this is set to TRUE (Default), variance within individuals are calculated as well. If this is set to FALSE, The lowest level of the hierarchy will be the sample level. See Details below.


an optional distance matrix calculated on your data. If this is set to NULL (default), the raw pairwise distances will be calculated via dist().


if a distance matrix is supplied, this indicates whether or not it represents squared distances.


logical. If within = FALSE, the parameter rho is calculated (Ronfort et al. 1998; Meirmans and Liu 2018). By setting freq = TRUE, (default) allele counts will be converted to frequencies before the distance is calculated, otherwise, the distance will be calculated on allele counts, which can bias results in mixed-ploidy data sets. Note that this option has no effect for haploid or presence/absence data sets.


a character defining the correction method for non-euclidean distances. Options are ade4::quasieuclid() (Default), ade4::lingoes(), and ade4::cailliez(). See Details below.


Deprecated. As of poppr version 2, this argument serves no purpose.


logical When set to TRUE, mlg.filter will be run to determine genotypes from the distance matrix. It defaults to FALSE. You can set the parameters with algorithm and threshold arguments. Note that this will not be performed when within = TRUE. Note that the threshold should be the number of allowable substitutions if you don't supply a distance matrix.


a number indicating the minimum distance two MLGs must be separated by to be considered different. Defaults to 0, which will reflect the original (naive) MLG definition.


determines the type of clustering to be done.


(default) merges clusters based on the maximum distance between points in either cluster. This is the strictest of the three.


merges clusters based on the minimum distance between points in either cluster. This is the loosest of the three.


merges clusters based on the average distance between every pair of points between clusters.


specify method of correcting for missing data utilizing options given in the function missingno(). Default is "loci".


specify the level at which missing data should be removed/modified. See missingno() for details.


logical If FALSE (Default), messages regarding any corrections will be printed to the screen. If TRUE, no messages will be printed.


Which method for calculating AMOVA should be used? Choices refer to package implementations: "ade4" (default) or "pegas". See details for differences.


the number of permutations passed to the pegas implementation of amova.


The poppr implementation of AMOVA is a very detailed wrapper for the ade4 implementation. The output is an ade4::amova() class list that contains the results in the first four elements. The inputs are contained in the last three elements. The inputs required for the ade4 implementation are:

  1. a distance matrix on all unique genotypes (haplotypes)

  2. a data frame defining the hierarchy of the distance matrix

  3. a genotype (haplotype) frequency table.

All of this data can be constructed from a genind object, but can be daunting for a novice R user. This function automates the entire process. Since there are many variables regarding genetic data, some points need to be highlighted:

On Hierarchies:

The hierarchy is defined by different population strata that separate your data hierarchically. These strata are defined in the strata slot of genind and genclone objects. They are useful for defining the population factor for your data. See the function strata() for details on how to properly define these strata.

On Within Individual Variance:

Heterozygosities within genotypes are sources of variation from within individuals and can be quantified in AMOVA. When within = TRUE, poppr will split genotypes into haplotypes with the function make_haplotypes() and use those to calculate within-individual variance. No estimation of phase is made. This acts much like the default settings for AMOVA in the Arlequin software package. Within individual variance will not be calculated for haploid individuals or dominant markers as the haplotypes cannot be split further. Setting within = FALSE uses the euclidean distance of the allele frequencies within each individual

On Euclidean Distances:

With the ade4 implementation of AMOVA (utilized by poppr), distances must be Euclidean (due to the nature of the calculations). Unfortunately, many genetic distance measures are not always euclidean and must be corrected for before being analyzed. Poppr automates this with three methods implemented in ade4, quasieuclid(), lingoes(), and cailliez(). The correction of these distances should not adversely affect the outcome of the analysis.

On Filtering:

Filtering multilocus genotypes is performed by mlg.filter(). This can necessarily only be done AMOVA tests that do not account for within-individual variance. The distance matrix used to calculate the amova is derived from using mlg.filter() with the option stats = "distance", which reports the distance between multilocus genotype clusters. One useful way to utilize this feature is to correct for genotypes that have equivalent distance due to missing data. (See example below.)

On Methods:

Both ade4 and pegas have implementations of AMOVA, both of which are appropriately called "amova". The ade4 version is faster, but there have been questions raised as to the validity of the code utilized. The pegas version is slower, but careful measures have been implemented as to the accuracy of the method. It must be noted that there appears to be a bug regarding permuting analyses where within individual variance is accounted for (within = TRUE) in the pegas implementation. If you want to perform permutation analyses on the pegas implementation, you must set within = FALSE. In addition, while clone correction is implemented for both methods, filtering is only implemented for the ade4 version.

On Polyploids:

As of poppr version 2.7.0, this function is able to calculate phi statistics for within-individual variance for polyploid data with full dosage information. When a data set does not contain full dosage information for all samples, then the resulting pseudo-haplotypes will contain missing data, which would result in an incorrect estimate of variance.

Instead, the AMOVA will be performed on the distance matrix derived from allele counts or allele frequencies, depending on the freq option. This has been shown to be robust to estimates with mixed ploidy (Ronfort et al. 1998; Meirmans and Liu 2018). If you wish to brute-force your way to estimating AMOVA using missing values, you can split your haplotypes with the make_haplotypes() function.

One strategy for addressing ambiguous dosage in your polyploid data set would be to convert your data to polysat's genambig class with the as.genambig(), estimate allele frequencies with polysat::deSilvaFreq(), and use these frequencies to randomly sample alleles to fill in the ambiguous alleles.


a list of class amova from the ade4 or pegas package. See ade4::amova() or pegas::amova() for details.


Excoffier, L., Smouse, P.E. and Quattro, J.M. (1992) Analysis of molecular variance inferred from metric distances among DNA haplotypes: application to human mitochondrial DNA restriction data. Genetics, 131, 479-491.

Ronfort, J., Jenczewski, E., Bataillon, T., and Rousset, F. (1998). Analysis of population structure in autotetraploid species. Genetics, 150, 921–930.

Meirmans, P., Liu, S. (2018) Analysis of Molecular Variance (AMOVA) for Autopolyploids Submitted.

See Also

ade4::amova(), pegas::amova(), clonecorrect(), diss.dist(), missingno(), ade4::is.euclid(), strata(), make_haplotypes(), as.genambig()


strata(Aeut) <- other(Aeut)$population_hierarchy[-1]
agc <- as.genclone(Aeut)
amova.result <- poppr.amova(agc, ~Pop/Subpop)
amova.test <- randtest(amova.result) # Test for significance

## Not run: 

# You can get the same results with the pegas implementation
amova.pegas <- poppr.amova(agc, ~Pop/Subpop, method = "pegas")

# Clone correction is possible
amova.cc.result <- poppr.amova(agc, ~Pop/Subpop, clonecorrect = TRUE)
amova.cc.test <- randtest(amova.cc.result)

# Example with filtering
splitStrata(monpop) <- ~Tree/Year/Symptom
poppr.amova(monpop, ~Symptom/Year) # gets a warning of zero distances
poppr.amova(monpop, ~Symptom/Year, filter = TRUE, threshold = 0.1) # no warning

## End(Not run)

Example output

Loading required package: adegenet
Loading required package: ade4

   /// adegenet 2.0.1 is loaded ////////////

   > overview: '?adegenet'
   > tutorials/doc/questions: 'adegenetWeb()' 
   > bug reports/feature requests: adegenetIssues()

This is poppr version 2.5.0. To get started, type package?poppr
OMP parallel support: available

This is a genclone object
Genotype information:

   119 original multilocus genotypes 
   187 diploid individuals
    56 dominant loci

Population information:

     2 strata - Pop, Subpop
     2 populations defined - Athena, Mt. Vernon

 No missing values detected.

ade4::amova(samples = xtab, distances = xdist, structures = xstruct)

                            Df    Sum Sq     Mean Sq
Between Pop                  1 1051.2345 1051.234516
Between samples Within Pop  16  273.4575   17.091091
Within samples             169  576.5059    3.411277
Total                      186 1901.1979   10.221494

                                           Sigma          %
Variations  Between Pop                11.063446  70.006786
Variations  Between samples Within Pop  1.328667   8.407483
Variations  Within samples              3.411277  21.585732
Total variations                       15.803391 100.000000

Phi-samples-total 0.7841427
Phi-samples-Pop   0.2803128
Phi-Pop-total     0.7000679

class: krandtest lightkrandtest 
Monte-Carlo tests
Call: randtest.amova(xtest = amova.result)

Number of tests:   3 

Adjustment method for multiple comparisons:   none 
Permutation number:   99 
                        Test       Obs   Std.Obs   Alter Pvalue
1  Variations within samples  3.411277 -31.60269    less   0.01
2 Variations between samples  1.328667  18.43717 greater   0.01
3     Variations between Pop 11.063446  13.20472 greater   0.01

 No missing values detected.

Warning message:
In is.euclid(xdist) : Zero distance(s)

	Analysis of Molecular Variance

Call: pegas::amova(formula = hier, data = hierdf, nperm = nperm, is.squared = FALSE)

             SSD         MSD  df
Pop    1051.2345 1051.234516   1
Subpop  273.4575   17.091091  16
Error   576.5059    3.411277 169
Total  1901.1979   10.221494 186

Variance components:
      Pop    Subpop     Error 
11.063446  1.328667  3.411277 

Variance coefficients:
       a        b        c 
10.29589 11.16949 93.36898 

       Pop     Subpop      Error 
0.70006786 0.08407483 0.21585732 

 No missing values detected.

ade4::amova(samples = xtab, distances = xdist, structures = xstruct)

                            Df    Sum Sq    Mean Sq
Between Pop                  1  741.9872 741.987234
Between samples Within Pop  16  185.6877  11.605483
Within samples             123  520.1123   4.228555
Total                      140 1447.7872  10.341337

                                            Sigma          %
Variations  Between Pop                10.4131525  66.777680
Variations  Between samples Within Pop  0.9520545   6.105355
Variations  Within samples              4.2285550  27.116965
Total variations                       15.5937620 100.000000

Phi-samples-total 0.7288303
Phi-samples-Pop   0.1837727
Phi-Pop-total     0.6677768

class: krandtest lightkrandtest 
Monte-Carlo tests
Call: randtest.amova(xtest = amova.cc.result)

Number of tests:   3 

Adjustment method for multiple comparisons:   none 
Permutation number:   99 
                        Test        Obs    Std.Obs   Alter Pvalue
1  Variations within samples  4.2285550 -22.149571    less   0.01
2 Variations between samples  0.9520545   9.136886 greater   0.01
3     Variations between Pop 10.4131525  10.094187 greater   0.01

 No loci with missing values above 5% found.

Distance matrix is non-euclidean.
Utilizing quasieuclid correction method. See ?quasieuclid for details.
ade4::amova(samples = xtab, distances = xdist, structures = xstruct)

                                Df      Sum Sq   Mean Sq
Between Symptom                  1    8.751823  8.751823
Between samples Within Symptom   4  220.720591 55.180148
Within samples                 688 2414.685005  3.509717
Total                          693 2644.157419  3.815523

                                                Sigma          %
Variations  Between Symptom                -0.1678446  -4.383251
Variations  Between samples Within Symptom  0.4873550  12.727243
Variations  Within samples                  3.5097166  91.656008
Total variations                            3.8292270 100.000000

Phi-samples-total    0.08343992
Phi-samples-Symptom  0.12192802
Phi-Symptom-total   -0.04383251

Warning messages:
1: In is.euclid(xdist) : Zero distance(s)
2: In is.euclid(distmat) : Zero distance(s)
Filtering ...
Original multilocus genotypes ... 264
Contracted multilocus genotypes ... 236

 No loci with missing values above 5% found.

Distance matrix is non-euclidean.
Utilizing quasieuclid correction method. See ?quasieuclid for details.
ade4::amova(samples = xtab, distances = xdist, structures = xstruct)

                                Df      Sum Sq   Mean Sq
Between Symptom                  1    8.793784  8.793784
Between samples Within Symptom   4  221.212012 55.303003
Within samples                 688 2422.331271  3.520830
Total                          693 2652.337066  3.827326

                                                Sigma          %
Variations  Between Symptom                -0.1681045  -4.376428
Variations  Between samples Within Symptom  0.4884090  12.715226
Variations  Within samples                  3.5208303  91.661202
Total variations                            3.8411348 100.000000

Phi-samples-total    0.08338798
Phi-samples-Symptom  0.12182086
Phi-Symptom-total   -0.04376428

Warning message:
system call failed: Cannot allocate memory 

poppr documentation built on March 18, 2018, 1:14 p.m.