poppr-package: The 'poppr' R package

poppr-packageR Documentation

The poppr R package

Description

Poppr provides tools for population genetic analysis that include genotypic diversity measures, genetic distances with bootstrap support, native organization and handling of population hierarchies, and clone correction.

To cite poppr, please use citation("poppr"). When referring to poppr in your manuscript, please use lower case unless it occurs at the beginning of a sentence.

Details

This package relies on the adegenet package. It is built around the genind and genlight object. Genind objects store genetic information in a table of allele frequencies while genlight objects store SNP data efficiently by packing binary allele calls into single bits. Poppr has extended these object into new objects called genclone and snpclone, respectively. These objects are designed for analysis of clonal organisms as they add the @mlg slot for keeping track of multilocus genotypes and multilocus lineages.

Documentation

Documentation is available for any function by typing ?function_name in the R console. Detailed topic explanations live in the package vignettes:

Vignette command
Data import and manipulation vignette("poppr_manual", "poppr")
Algorithms and Equations vignette("algo", "poppr")
Multilocus Genotype Analysis vignette("mlg", "poppr")

Essential functions for importing and manipulating data are detailed within the Data import and manipulation vignette, details on algorithms used in poppr are within the Algorithms and equations vignette, and details for working with multilocus genotypes are in Multilocus Genotype Analysis.

Examples of analyses are available in a primer written by Niklaus J. Grünwald, Zhian N. Kamvar, and Sydney E. Everhart at https://grunwaldlab.github.io/Population_Genetics_in_R/.

Getting help

If you have a specific question or issue with poppr, feel free to contribute to the google group at https://groups.google.com/d/forum/poppr. If you find a bug and are a github user, you can submit bug reports at https://github.com/grunwaldlab/poppr/issues. Otherwise, leave a message on the groups. Personal emails are highly discouraged as they do not allow others to learn.

Functions in poppr

Below are descriptions and links to functions found in poppr. Be aware that all functions in adegenet are also available. The functions are documented as:

  • function_name() (data type) - Description

Where ‘data type’ refers to the type of data that can be used:

m a genclone or genind object
s a snpclone or genlight object
x a different data type (e.g. a matrix from mlg.table())

Data import/export

  • getfile() (x) - Provides a quick GUI to grab files for import

  • read.genalex() (x) - Reads GenAlEx formatted csv files to a genind object

  • genind2genalex() (m) - Converts genind objects to GenAlEx formatted csv files

  • genclone2genind() (m) - Removes the @mlg slot from genclone objects

  • as.genambig() (m) - Converts genind data to polysat's genambig data structure.

  • bootgen2genind() (x) - see aboot() for details)

Data Structures

Data structures "genclone" (based off of adegenet's genind) and "snpclone" (based off of adegenet's genlight for large SNP data sets). Both of these data structures are defined by the presence of an extra MLG slot representing multilocus genotype assignments, which can be a numeric vector or a MLG class object.

  • genclone - Handles microsatellite, presence/absence, and small SNP data sets

  • snpclone - Designed to handle larger binary SNP data sets.

  • MLG - An internal class holding a data frame of multilocus genotype assignments that acts like a vector, allowing the user to easily switch between different MLG definitions.

  • bootgen - An internal class used explicitly for aboot() that inherits the gen-class virtual object. It is designed to allow for sampling loci with replacement.

  • bruvomat - An internal class designed to handle bootstrapping for Bruvo's distance where blocks of integer loci can be shuffled.

Data manipulation

  • as.genclone() (m) - Converts genind objects to genclone objects

  • missingno() (m) - Handles missing data

  • clonecorrect() (m | s) - Clone-censors at a specified population hierarchy

  • informloci() (m) - Detects and removes phylogenetically uninformative loci

  • popsub() (m | s) - Subsets genind objects by population

  • shufflepop() (m) - Shuffles genotypes at each locus using four different shuffling algorithms

  • recode_polyploids() (m | x) - Recodes polyploid data sets with missing alleles imported as "0"

  • make_haplotypes() (m | s) - Splits data into pseudo-haplotypes. This is mainly used in AMOVA.

  • test_replen() (m) - Tests for inconsistent repeat lengths in microsatellite data. For use in bruvo.dist() functions.

  • fix_replen() (m) - Fixes inconsistent repeat lengths. For use in bruvo.dist() functions.

Genetic distances

  • bruvo.dist() (m) - Bruvo's distance (see also: fix_replen())

  • diss.dist() (m) - Absolute genetic distance (see prevosti.dist())

  • nei.dist() (m | x) - Nei's 1978 genetic distance

  • rogers.dist() (m | x) - Rogers' euclidean distance

  • reynolds.dist() (m | x) - Reynolds' coancestry distance

  • edwards.dist() (m | x) - Edwards' angular distance

  • prevosti.dist() (m | x) - Prevosti's absolute genetic distance

  • bitwise.dist() (s) - Calculates fast pairwise distances for genlight objects.

Bootstrapping

  • aboot() (m | s | x) - Creates a bootstrapped dendrogram for any distance measure

  • bruvo.boot() (m) - Produces dendrograms with bootstrap support based on Bruvo's distance

  • diversity_boot() (x) - Generates boostrap distributions of diversity statistics for multilocus genotypes

  • diversity_ci() (m | s | x) - Generates confidence intervals for multilocus genotype diversity.

  • resample.ia() (m) - Calculates the index of association over subsets of data.

Multilocus Genotypes

  • mlg() (m | s) - Calculates the number of multilocus genotypes

  • mll() (m | s) - Displays the current multilocus lineages (genotypes) defined.

  • nmll() (m | s) - Same as mlg().

  • mlg.crosspop() (m | s) - Finds all multilocus genotypes that cross populations

  • mlg.table() (m | s) - Returns a table of populations by multilocus genotypes

  • mlg.vector() (m | s) - Returns a vector of a numeric multilocus genotype assignment for each individual

  • mlg.id() (m | s) - Finds all individuals associated with a single multilocus genotype

  • mlg.filter() (m | s) - Collapses MLGs by genetic distance

  • filter_stats() (m | s) - Calculates mlg.filter for all algorithms and plots

  • cutoff_predictor() (x) - Predicts cutoff threshold from mlg.filter.

  • mll.custom() (m | s) - Allows for the custom definition of multilocus lineages

  • mll.levels() (m | s) - Allows the user to change levels of custom MLLs.

  • mll.reset() (m | s) - Reset multilocus lineages.

  • diversity_stats() (x) - Creates a table of diversity indices for multilocus genotypes.

Index of Association Analysis

Analysis of multilocus linkage disequilibrium.

  • ia() (m) - Calculates the index of association

  • pair.ia() (m) - Calculates the index of association for all loci pairs.

  • win.ia() (s) - Index of association windows for genlight objects.

  • samp.ia() (s) - Index of association on random subsets of loci for genlight objects.

Population Genetic Analysis

  • poppr.amova() (m | s) - Analysis of Molecular Variance (as implemented in ade4)

  • poppr() (m | x) - Returns a diversity table by population

  • poppr.all() (m | x) - Returns a diversity table by population for all compatible files specified

  • private_alleles() (m) - Tabulates the occurrences of alleles that only occur in one population.

  • locus_table() (m) - Creates a table of summary statistics per locus.

  • rrmlg() (m | x) - Round-robin multilocus genotype estimates.

  • rraf() (m) - Round-robin allele frequency estimates.

  • pgen() (m) - Probability of genotypes.

  • psex() (m) - Probability of observing a genotype more than once.

  • rare_allele_correction (m) - rules for correcting rare alleles for round-robin estimates.

  • incomp() (m) - Check data for incomparable samples.

Visualization

  • imsn() (m | s) - Interactive construction and visualization of minimum spanning networks

  • plot_poppr_msn() (m | s | x) - Plots minimum spanning networks produced in poppr with scale bar and legend

  • greycurve() (x) - Helper to determine the appropriate parameters for adjusting the grey level for msn functions

  • bruvo.msn() (m) - Produces minimum spanning networks based off Bruvo's distance colored by population

  • poppr.msn() (m | s | x) - Produces a minimum spanning network for any pairwise distance matrix related to the data

  • info_table() (m) - Creates a heatmap representing missing data or observed ploidy

  • genotype_curve() (m | x) - Creates a series of boxplots to demonstrate how many markers are needed to represent the diversity of your data.

Datasets

  • Aeut() - (AFLP) Oomycete root rot pathogen Aphanomyces euteiches (Grünwald and Hoheisel, 2006)

  • monpop() - (SSR) Peach brown rot pathogen Monilinia fructicola (Everhart and Scherm, 2015)

  • partial_clone() - (SSR) partially-clonal data simulated via simuPOP (Peng and Amos, 2008)

  • Pinf() - (SSR) Potato late blight pathogen Phytophthora infestans (Goss et. al., 2014)

  • Pram() - (SSR) Sudden Oak Death pathogen Phytophthora ramorum (Kamvar et. al., 2015; Goss et. al., 2009)

Author(s)

Zhian N. Kamvar, Jonah C. Brooks, Sydney E. Everhart, Javier F. Tabima, Stacy Krueger-Hadfield, Erik Sotka, Niklaus J. Grünwald

Maintainer: Zhian N. Kamvar

References

——— Papers announcing poppr ———

Kamvar ZN, Tabima JF, Grünwald NJ. (2014) Poppr: an R package for genetic analysis of populations with clonal, partially clonal, and/or sexual reproduction. PeerJ 2:e281 \Sexpr[results=rd]{tools:::Rd_expr_doi("10.7717/peerj.281")}

Kamvar ZN, Brooks JC and Grünwald NJ (2015) Novel R tools for analysis of genome-wide population genetic data with emphasis on clonality. Front. Genet. 6:208. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.3389/fgene.2015.00208")}

——— Papers referencing data sets ———

Grünwald, NJ and Hoheisel, G.A. 2006. Hierarchical Analysis of Diversity, Selfing, and Genetic Differentiation in Populations of the Oomycete Aphanomyces euteiches. Phytopathology 96:1134-1141 doi: \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1094/PHYTO-96-1134")}

SE Everhart, H Scherm, (2015) Fine-scale genetic structure of Monilinia fructicola during brown rot epidemics within individual peach tree canopies. Phytopathology 105:542-549 doi: \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1094/PHYTO-03-14-0088-R")}

Bo Peng and Christopher Amos (2008) Forward-time simulations of nonrandom mating populations using simuPOP. bioinformatics, 24 (11): 1408-1409.

Goss, Erica M., Javier F. Tabima, David EL Cooke, Silvia Restrepo, William E. Fry, Gregory A. Forbes, Valerie J. Fieland, Martha Cardenas, and Niklaus J. Grünwald. (2014) "The Irish potato famine pathogen Phytophthora infestans originated in central Mexico rather than the Andes." Proceedings of the National Academy of Sciences 111:8791-8796. doi: \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1073/pnas.1401884111")}

Kamvar, Z. N., Larsen, M. M., Kanaskie, A. M., Hansen, E. M., & Grünwald, N. J. (2015). Spatial and temporal analysis of populations of the sudden oak death pathogen in Oregon forests. Phytopathology 105:982-989. doi: \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1094/PHYTO-12-14-0350-FI")}

Goss, E. M., Larsen, M., Chastagner, G. A., Givens, D. R., and Grünwald, N. J. 2009. Population genetic analysis infers migration pathways of Phytophthora ramorum in US nurseries. PLoS Pathog. 5:e1000583. doi: \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1371/journal.ppat.1000583")}


grunwaldlab/poppr documentation built on March 18, 2024, 11:24 p.m.