adegenet.package | R Documentation |
This package is devoted to the multivariate analysis of genetic markers
data. These data can be codominant markers (e.g. microsatellites) or
presence/absence data (e.g. AFLP), and have any level of ploidy. 'adegenet'
defines three formal (S4) classes:
- genind: a class for
data of individuals ("genind" stands for genotypes-individuals).
-
genpop: a class for data of groups of individuals ("genpop"
stands for genotypes-populations)
- genlight: a class for
genome-wide SNP data
For more information about these classes, type "class ? genind", "class ?
genpop", or "?genlight".
Essential functionalities of the package are presented througout 4
tutorials, accessible using adegenetTutorial(which="name-below")
:
- basics
: introduction to the package.
- spca
: multivariate
analysis of spatial genetic patterns.
- dapc
: population structure
and group assignment using DAPC.
- genomics
: introduction to the
class genlight for the handling and analysis of genome-wide
SNP data.
Note: In older versions of adegenet, these tutorials were avilable as
vignettes, accessible through the function vignette("name-below",
package="adegenet")
:
- adegenet-basics
.
-
adegenet-spca
.
- adegenet-dapc
.
-
adegenet-genomics
.
Important functions are also summarized below.
=== IMPORTING DATA ===
= TO GENIND OBJECTS =
adegenet
imports
data to genind object from the following softwares:
-
STRUCTURE: see read.structure
- GENETIX: see
read.genetix
- FSTAT: see read.fstat
-
Genepop: see read.genepop
To import data from any of these
formats, you can also use the general function
import2genind
.
In addition, it can extract polymorphic sites from nucleotide and amino-acid
alignments:
- DNA files: use read.dna
from the ape
package, and then extract SNPs from DNA alignments using
DNAbin2genind
.
- protein sequences alignments: polymorphic sites can be extracted from
protein sequences alignments in alignment
format (package
seqinr
, see as.alignment
) using the function
alignment2genind
.
The function fasta2DNAbin
allows for reading fasta files into
DNAbin object with minimum RAM requirements.
It is also possible to read genotypes coded by character strings from a
data.frame in which genotypes are in rows, markers in columns. For this, use
df2genind
. Note that df2genind
can be used for
any level of ploidy.
= TO GENLIGHT OBJECTS =
SNP data can be read from the following
formats:
- PLINK: see function read.PLINK
- .snp
(adegenet's own format): see function read.snp
SNP can also be extracted from aligned DNA sequences with the fasta format,
using fasta2genlight
=== EXPORTING DATA ===
adegenet
exports data from
Genotypes can also be recoded from a genind object into a
data.frame of character strings, using any separator between alleles. This
covers formats from many softwares like GENETIX or STRUCTURE. For this, see
genind2df
.
Also note that the pegas
package imports genind objects
using the function as.loci
.
=== MANIPULATING DATA ===
Several functions allow one to manipulate
genind or genpop objects
-
genind2genpop
: convert a genind object to a
genpop
- seploc
: creates one object per
marker; for genlight objects, creates blocks of SNPs.
-
seppop
: creates one object per population
-
- tab
: access the allele data (counts or frequencies) of an object
(genind and genpop)
-
x[i,j]: create a new object keeping only genotypes (or populations) indexed
by 'i' and the alleles indexed by 'j'.
- makefreq
: returns
a table of allelic frequencies from a genpop object.
-
repool
merges genoptypes from different gene pools into one
single genind object.
- propTyped
returns the
proportion of available (typed) data, by individual, population, and/or
locus.
- selPopSize
subsets data, retaining only genotypes
from a population whose sample size is above a given level.
-
pop
sets the population of a set of genotypes.
=== ANALYZING DATA ===
Several functions allow to use usual, and less
usual analyses:
- HWE.test.genind
: performs HWE test for all
populations and loci combinations
- dist.genpop
: computes 5
genetic distances among populations.
- monmonier
:
implementation of the Monmonier algorithm, used to seek genetic boundaries
among individuals or populations. Optimized boundaries can be obtained using
optimize.monmonier
. Object of the class monmonier
can be
plotted and printed using the corresponding methods.
-
spca
: implements Jombart et al. (2008) spatial Principal
Component Analysis
- global.rtest
: implements Jombart et
al. (2008) test for global spatial structures
-
local.rtest
: implements Jombart et al. (2008) test for local
spatial structures
- propShared
: computes the proportion of
shared alleles in a set of genotypes (i.e. from a genind object)
-
propTyped
: function to investigate missing data in several ways
- scaleGen
: generic method to scale genind or
genpop before a principal component analysis
-
Hs
: computes the average expected heterozygosity by population
in a genpop. Classically Used as a measure of genetic
diversity.
- find.clusters
and dapc
: implement
the Discriminant Analysis of Principal Component (DAPC, Jombart et al.,
2010).
- seqTrack
: implements the SeqTrack algorithm for
recontructing transmission trees of pathogens (Jombart et al., 2010) .
glPca
: implements PCA for genlight objects.
-
gengraph
: implements some simple graph-based clustering using
genetic data. - snpposi.plot
and snpposi.test
:
visualize the distribution of SNPs on a genetic sequence and test their
randomness. - adegenetServer
: opens up a web interface for
some functionalities of the package (DAPC with cross validation and feature
selection).
=== GRAPHICS ===
- colorplot
: plots points with associated
values for up to three variables represented by colors using the RGB system;
useful for spatial mapping of principal components.
-
loadingplot
: plots loadings of variables. Useful for
representing the contribution of alleles to a given principal component in a
multivariate method.
- scatter.dapc
: scatterplots for DAPC
results.
- compoplot
: plots membership probabilities from a
DAPC object.
=== SIMULATING DATA ===
- hybridize
: implements
hybridization between two populations.
- haploGen
:
simulates genealogies of haplotypes, storing full genomes.
glSim
: simulates simple genlight objects.
=== DATASETS ===
- H3N2
: Seasonal influenza (H3N2) HA
segment data.
- dapcIllus
: Simulated data illustrating the
DAPC.
- eHGDP
: Extended HGDP-CEPH dataset.
-
microbov
: Microsatellites genotypes of 15 cattle breeds.
-
nancycats
: Microsatellites genotypes of 237 cats from 17
colonies of Nancy (France).
- rupica
: Microsatellites
genotypes of 335 chamois (Rupicapra rupicapra) from the Bauges mountains
(France).
- sim2pop
: Simulated genotypes of two
georeferenced populations.
- spcaIllus
: Simulated data
illustrating the sPCA.
For more information, visit the adegenet website using the function
adegenetWeb
.
Tutorials are available via the command adegenetTutorial
.
To cite adegenet, please use the reference given by
citation("adegenet")
(or see references below).
Thibaut Jombart <t.jombart@imperial.ac.uk>
Developers: Zhian N. Kamvar <zkamvar@gmail.com>,
Caitlin Collins <caitiecollins17@gmail.com>,
Ismail Ahmed <ismail.ahmed@inserm.fr>,
Federico Calboli, Tobias Erik Reiners, Peter
Solymos, Anne Cori,
Contributed datasets from: Katayoun
Moazami-Goudarzi, Denis Laloë, Dominique Pontier, Daniel Maillard, Francois
Balloux.
Jombart T. (2008) adegenet: a R package for the multivariate
analysis of genetic markers Bioinformatics 24: 1403-1405. doi:
10.1093/bioinformatics/btn129
Jombart T. and Ahmed I. (2011) adegenet 1.3-1: new tools for the analysis of genome-wide SNP data. Bioinformatics. doi: 10.1093/bioinformatics/btr521
Jombart T, Devillard S and Balloux F (2010) Discriminant analysis of
principal components: a new method for the analysis of genetically
structured populations. BMC Genetics 11:94. doi:10.1186/1471-2156-11-94
Jombart T, Eggo R, Dodd P, Balloux F (2010) Reconstructing disease outbreaks
from genetic data: a graph approach. Heredity. doi:
10.1038/hdy.2010.78.
Jombart, T., Devillard, S., Dufour, A.-B. and Pontier, D. (2008) Revealing
cryptic spatial patterns in genetic variability by a new multivariate
method. Heredity, 101, 92–103.
See adegenet website: http://adegenet.r-forge.r-project.org/
Please post your questions on 'the adegenet forum': adegenet-forum@lists.r-forge.r-project.org
adegenet is related to several packages, in particular:
-
ade4
for multivariate analysis
- pegas
for population
genetics tools
- ape
for phylogenetics and DNA data handling
-
seqinr
for handling nucleic and proteic sequences
- shiny
for R-based web interfaces
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.