These functions implement the Discriminant Analysis of Principal
Components (DAPC, Jombart et al. 2010). This method descibes the
diversity between predefined groups. When groups are unknown, use
find.clusters
to infer genetic clusters. See 'details' section
for a succint description of the method, and
vignette("adegenetdapc")
for a tutorial. Graphical methods for
DAPC are documented in scatter.dapc
(see
?scatter.dapc
).
dapc
is a generic function performing the DAPC on the following
types of objects:
 data.frame
(only numeric data)
 matrix
(only numeric data)
 genind
objects (genetic markers)
 genlight
objects (genomewide SNPs)
These methods all return an object with class dapc
.
Functions that can be applied to these objects are (the ".dapc" can be ommitted):
 print.dapc
: prints the content of a dapc
object.
 summary.dapc
: extracts useful information from a dapc
object.
 predict.dapc
: predicts group memberships based on DAPC results.
 xvalDapc
: performs crossvalidation of DAPC using varying
numbers of PCs (and keeping the number of discriminant functions
fixed); it currently has methods for data.frame
and matrix
.
DAPC implementation calls upon dudi.pca
from the
ade4
package (except for genlight objects)
and lda
from the MASS
package. The
predict
procedure uses predict.lda
from the
MASS
package.
as.lda
is a generic with a method for dapc
object which
converts these objects into outputs similar to that of
lda.default
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30  ## S3 method for class 'data.frame'
dapc(x, grp, n.pca=NULL, n.da=NULL, center=TRUE,
scale=FALSE, var.contrib=TRUE, var.loadings=FALSE, pca.info=TRUE,
pca.select=c("nbEig","percVar"), perc.pca=NULL, ..., dudi=NULL)
## S3 method for class 'matrix'
dapc(x, ...)
## S3 method for class 'genind'
dapc(x, pop=NULL, n.pca=NULL, n.da=NULL, scale=FALSE,
truenames=TRUE, var.contrib=TRUE, var.loadings=FALSE, pca.info=TRUE,
pca.select=c("nbEig","percVar"), perc.pca=NULL, ...)
## S3 method for class 'genlight'
dapc(x, pop=NULL, n.pca=NULL, n.da=NULL,
scale=FALSE, var.contrib=TRUE, var.loadings=FALSE, pca.info=TRUE,
pca.select=c("nbEig", "percVar"), perc.pca=NULL, glPca=NULL, ...)
## S3 method for class 'dudi'
dapc(x, grp, ...)
## S3 method for class 'dapc'
print(x, ...)
## S3 method for class 'dapc'
summary(object, ...)
## S3 method for class 'dapc'
predict(object, newdata, prior = object$prior, dimen,
method = c("plugin", "predictive", "debiased"), ...)

x 

grp,pop 
a 
n.pca 
an 
n.da 
an 
center 
a 
scale 
a 
var.contrib 
a 
var.loadings 
a 
pca.info 
a 
pca.select 
a 
perc.pca 
a 
... 
further arguments to be passed to other functions. For

glPca 
an optional 
object 
a 
truenames 
a 
dudi 
optionally, a multivariate analysis with the class

newdata 
an optional dataset of individuals whose membership is
seeked; can be a data.frame, a matrix, a genind or a
genlight object, but object class must match the
original ('training') data. In particular, variables must be exactly
the same as in the original data. For genind
objects, see 
prior,dimen,method 
see 
The Discriminant Analysis of Principal Components (DAPC) is designed to investigate the genetic structure of biological populations. This multivariate method consists in a twosteps procedure. First, genetic data are transformed (centred, possibly scaled) and submitted to a Principal Component Analysis (PCA). Second, principal components of PCA are submitted to a Linear Discriminant Analysis (LDA). A trivial matrix operation allows to express discriminant functions as linear combination of alleles, therefore allowing one to compute allele contributions. More details about the computation of DAPC are to be found in the indicated reference.
DAPC does not infer genetic clusters ex nihilo; for this, see the
find.clusters
function.
=== dapc objects ===
The class dapc
is a list with the following
components:
call 
the matched call. 
n.pca 
number of PCA axes retained 
n.da 
number of DA axes retained 
var 
proportion of variance conserved by PCA principal components 
eig 
a numeric vector of eigenvalues. 
grp 
a factor giving prior group assignment 
prior 
a numeric vector giving prior group probabilities 
assign 
a factor giving posterior group assignment 
tab 
matrix of retained principal components of PCA 
loadings 
principal axes of DAPC, giving coefficients of the linear combination of retained PCA axes. 
ind.coord 
principal components of DAPC, giving the coordinates of individuals onto principal axes of DAPC; also called the discriminant functions. 
grp.coord 
coordinates of the groups onto the principal axes of DAPC. 
posterior 
a data.frame giving posterior membership probabilities for all individuals and all clusters. 
var.contr 
(optional) a data.frame giving the contributions of original variables (alleles in the case of genetic data) to the principal components of DAPC. 
var.load 
(optional) a data.frame giving the loadings of original variables (alleles in the case of genetic data) to the principal components of DAPC. 
match.prp 
a list, where each item is the proportion of individuals correctly matched to their original population in crossvalidation. 
=== other outputs ===
Other functions have different outputs:
 summary.dapc
returns a list with 6 components: n.dim
(number
of retained DAPC axes), n.pop
(number of groups/populations),
assign.prop
(proportion of overall correct assignment),
assign.per.pop
(proportion of correct assignment per group),
prior.grp.size
(prior group sizes), and post.grp.size
(posterior
group sizes), xval.dapc
, xval.genind
and xval
(all return a list of four lists, each one with as many items as
crossvalidation runs. The first item is a list of assign
components, the secon is a list of posterior
components, the
thirs is a list of ind.score
components and the fourth is a
list of match.prp
items, i.e. the prortion of the validation
set correctly matched to its original population)
Thibaut Jombart t.jombart@imperial.ac.uk
Jombart T, Devillard S and Balloux F (2010) Discriminant analysis of principal components: a new method for the analysis of genetically structured populations. BMC Genetics11:94. doi:10.1186/147121561194
xvalDapc
: selection of the optimal numbers of PCA axes
retained in DAPC using crossvalidation.
scatter.dapc
, assignplot
,
compoplot
: graphics for DAPC.
find.clusters
: to identify clusters without prior.
dapcIllus
: a set of simulated data illustrating
the DAPC
eHGDP
, H3N2
: empirical datasets
illustrating DAPC
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106  ## data(dapcIllus), data(eHGDP), and data(H3N2) illustrate the dapc
## see ?dapcIllus, ?eHGDP, ?H3N2
##
## Not run:
example(dapcIllus)
example(eHGDP)
example(H3N2)
## End(Not run)
## H3N2 EXAMPLE ##
data(H3N2)
pop(H3N2) < factor(H3N2$other$epid)
dapc1 < dapc(H3N2, var.contrib=FALSE, scale=FALSE, n.pca=150, n.da=5)
## remove internal segments and ellipses, different pch, add MStree
scatter(dapc1, cell=0, pch=18:23, cstar=0, mstree=TRUE, lwd=2, lty=2)
## label individuals at the periphery
# air = 2 is a measure of how much space each label needs
# pch = NA suppresses plotting of points
scatter(dapc1, label.inds = list(air = 2, pch = NA))
## only ellipse, custom labels
scatter(dapc1, cell=2, pch="", cstar=0, posi.da="top",
label=paste("year\n",2001:2006), axesel=FALSE, col=terrain.colors(10))
## SHOW COMPOPLOT ON MICROBOV DATA ##
data(microbov)
dapc1 < dapc(microbov, n.pca=20, n.da=15)
compoplot(dapc1, lab="")
## Not run:
## EXAMPLE USING GENLIGHT OBJECTS ##
## simulate data
x < glSim(50,4e350, 50, ploidy=2)
x
plot(x)
## perform DAPC
dapc1 < dapc(x, n.pca=10, n.da=1)
dapc1
## plot results
scatter(dapc1, scree.da=FALSE)
## SNP contributions
loadingplot(dapc1$var.contr)
loadingplot(tail(dapc1$var.contr, 100), main="Loading plot  last 100 SNPs")
## USE "PREDICT" TO PREDICT GROUPS OF NEW INDIVIDUALS ##
## load data
data(sim2pop)
## we make a dataset of:
## 30 individuals from pop A
## 30 individuals from pop B
## 30 hybrids
## separate populations and make F1
temp < seppop(sim2pop)
temp < lapply(temp, function(e) hybridize(e,e,n=30)) # force equal popsizes
## make hybrids
hyb < hybridize(temp[[1]], temp[[2]], n=30)
## repool data  needed to ensure allele matching
newdat < repool(temp[[1]], temp[[2]], hyb)
pop(newdat) < rep(c("pop A", "popB", "hyb AB"), c(30,30,30))
## perform the DAPC on the first 2 pop (60 first indiv)
dapc1 < dapc(newdat[1:60],n.pca=5,n.da=1)
## plot results
scatter(dapc1, scree.da=FALSE)
## make prediction for the 30 hybrids
hyb.pred < predict(dapc1, newdat[61:90])
hyb.pred
## plot the inferred coordinates (circles are hybrids)
points(hyb.pred$ind.scores, rep(.1, 30))
## look at assignment using assignplot
assignplot(dapc1, new.pred=hyb.pred)
title("30 indiv popA, 30 indiv pop B, 30 hybrids")
## image using compoplot
compoplot(dapc1, new.pred=hyb.pred, ncol=2)
title("30 indiv popA, 30 indiv pop B, 30 hybrids")
## CROSSVALIDATION ##
data(sim2pop)
xval < xvalDapc(sim2pop@tab, pop(sim2pop), n.pca.max=100, n.rep=3)
xval
boxplot(xval$success~xval$n.pca, xlab="Number of PCA components",
ylab="Classification succes", main="DAPC  crossvalidation")
## End(Not run)

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.
Please suggest features or report bugs with the GitHub issue tracker.
All documentation is copyright its authors; we didn't write any of that.