snpzip: Identification of structural SNPs
In adegenet: Exploratory Analysis of Genetic and Genomic Data

snpzip

R Documentation

Identification of structural SNPs

Description

The function snpzip identifies the set of alleles which contribute most significantly to phenotypic structure.

This procedure uses Discriminant Analysis of Principal Components (DAPC) to quantify the contribution of individual alleles to between-population structure. Then, defining contribution to DAPC as the measure of distance between alleles, hierarchical clustering is used to identify two groups of alleles: structural SNPs and non-structural SNPs.

Usage

  snpzip(snps, y, plot = TRUE, xval.plot = FALSE, loading.plot = FALSE,
         method = c("complete", "single", "average", "centroid", 
                    "mcquitty", "median", "ward"), ...)

Arguments

`snps`	a snps `matrix` used as input of DAPC.
`y`	either a `factor` indicating the group membership of individuals, or a dapc object.
`plot`	a `logical` indicating whether a graphical representation of the DAPC results should be displayed.
`xval.plot`	a `logical` indicating whether the results of the cross-validation step should be displayed (iff `y` is a factor).
`loading.plot`	a `logical` indicating whether a loading.plot displaying the SNP selection threshold should be displayed.
`method`	the clustering method to be used. This should be (an unambiguous abbreviation of) one of `"complete", "single", "average", "centroid", "mcquitty", "median",` or `"ward"`.
`...`	further arguments.

Details

snpzip provides an objective procedure to delineate between structural and non-structural SNPs identified by Discriminant Analysis of Principal Components (DAPC, Jombart et al. 2010). snpzip precedes the multivariate analysis with a cross-validation step to ensure that the subsequent DAPC is performed optimally. The contributions of alleles to the DAPC are then submitted to hclust, where they define a distance matrix upon which hierarchical clustering is carried out. To complete the procedure, snpzip uses cutree to automatically subdivide the set of SNPs fed into the analysis into two groups: those which contribute significantly to the phenotypic structure of interest, and those which do not.

Value

A list with four items if y is a factor, or two items if y is a dapc object: The first cites the number of principal components (PCs) of PCA retained in the DAPC.

The second item is an embedded list which first indicates the number of structural and non-structural SNPs identified by snpzip, second provides a list of the structuring alleles, third gives the names of the selected alleles, and fourth details the contributions of these structuring alleles to the DAPC.

The optional third item provides measures of discrimination success both overall and by group.

The optional fourth item contains the dapc object generated if y was a factor.

If plot=TRUE, a scatter plot will provide a visualization of the DAPC results.

If xval.plot=TRUE, the results of the cross-validation step will be displayed as an array of the format generated by xvalDapc, and a scatter plot of the results of cross-validation will be provided.

If loading.plot=TRUE, a loading plot will be generated to show the contributions of alleles to the DAPC, and the SNP selection threshold will be indicated. If the number of Discriminant Axes (n.da) in the DAPC is greater than 1, loading.plot=TRUE will generate one loading plot for each discriminant axis.

Author(s)

Caitlin Collins caitlin.collins12@imperial.ac.uk

References

Jombart T, Devillard S and Balloux F (2010) Discriminant analysis of principal components: a new method for the analysis of genetically structured populations. BMC Genetics11:94. doi:10.1186/1471-2156-11-94

Examples

  ## Not run: 
    simpop <- glSim(100, 10000, n.snp.struc = 10, grp.size = c(0.3,0.7), 
                    LD = FALSE, alpha = 0.4, k = 4)
    snps <- as.matrix(simpop)
    phen <- simpop@pop
    
    outcome <- snpzip(snps, phen, method = "centroid")
    outcome
  
## End(Not run)
  ## Not run: 
    simpop <- glSim(100, 10000, n.snp.struc = 10, grp.size = c(0.3,0.7), 
                    LD = FALSE, alpha = 0.4, k = 4)
    snps <- as.matrix(simpop)
    phen <- simpop@pop
    
    dapc1 <- dapc(snps, phen, n.da = 1, n.pca = 30)
    
    features <- snpzip(dapc1, loading.plot = TRUE, method = "average")
    features
  
## End(Not run)

adegenet documentation built on April 4, 2025, 1:15 a.m.