dapc_infer: Conduct an inference of _k_ prior to DAPC.
In j-a-thia/genomalicious: A smorgasbord of R functions for population genomic analyses

dapc_infer

R Documentation

Conduct an inference of k prior to DAPC.

Description

Takes a long-format data table of genotypes and assist in a preliminary inference of k, the effective number of populations. Inference of k is facilitated through examination of the PCA screeplot and through testing K-means testing.

Usage

dapc_infer(
  dat,
  scaling = "covar",
  sampCol = "SAMPLE",
  locusCol = "LOCUS",
  genoCol = "GT",
  kTest = 1:10L,
  pTest,
  screeMax = 20L,
  plotLook = "ggplot"
)

Arguments

`dat`	Data table: A long data table, e.g. like that imported from `vcf2DT`. Genotypes can be coded as '/' separated characters (e.g. '0/0', '0/1', '1/1'), or integers as Alt allele counts (e.g. 0, 1, 2). Must contain the following columns, The sampled individuals (see param `sampCol`). The locus ID (see param `locusCol`). The genotype column (see param `genoCol`).
`scaling`	Character: How should the data (loci) be scaled? Set to `'covar'` to scale to mean = 0, but variance is not adjusted, i.e. PCA on a covariance matrix. Set to `'corr'` to scale to mean = 0 and variance = 1, i.e. PCA on a correlation matrix. Set to `'patterson'` to use the Patteron et al. (2006) normalisation. Set to `'none'` to if you do not want to do any scaling before PCA.
`sampCol`	Character: The column name with the sampled individual information. Default is `'SAMPLE'`.
`locusCol`	Character: The column name with the locus information. Default is `'LOCUS'`.
`genoCol`	Character: The column name with the genotype information. Default is `'GT'`.
`kTest`	Integer: A vector of the number of (k) values to test. Default is `1:10`.
`pTest`	Integer: A vector of the number of (p) PC axes to fit K-means with.
`screeMax`	Integer: The maximum number of PC axes to plot in the screeplot.
`plotLook`	Character: The look of the plot. Default = `'ggplot'`, the typical gray background with gridlines produced by `ggplot2`. Alternatively, when set to `'classic'`, produces a base R style plot.

Details

DAPC was made popular in the population genetics/molecular ecology community following Jombart et al.'s (2010) paper. The method uses a DA to model the genetic differences among populations using PC axes of genotypes as predictors.

The choice of the number of PC axes to use as predictors of genetic differences among populations should be determined using the k-1 criterion described in Thia (2022). This criterion is based on the findings of Patterson et al. (2006) that only the leading k-1 PC axes of a genotype dataset capture biologically meaningful structure. Users can use the function genomalicious::dapc_infer to examine eigenvalue screeplots and perform K-means clustering with different parameters to infer the number of biologically informative PC axes.

Users should use examine both the screeplot of eigenvalues and the different K-means plots produced. The screeplot typically exhibits a break in the scree around the putative k. Additionally, different parameterisations of K-means clustering should also converge on a similar conclusion. Users may also find it useful to visualise scatterplots, e.g., using pca_genos.

This function can also be used to determine populations de novo if the user has not a priori expectation of the number of populations and the designation of individuals. See Miller et al. (2020) and Thia (2022) for distinction and importance of a priori vs. de novo population designations. The function returns all K-means solutions for all parameter combinations. After insepcting the screeplot and the K-means solutions, the desired K-means fit can be extracted from the returned object to obtain de novo population designations for downstream analysis, e.g., with genomalicious::dapc_fit.

Value

Returns a list: $tab is a datatable of k and p values examined and associated BIC value. $fit contains the individual outputs from kmeans for each combination of parameters fitted. $plot is a ggplot object.

References

Jombart et al. (2010) BMC Genetics. DOI: 10.1186/1471-2156-11-94 Miller et al. (2020) Heredity. DOI: 10.1038/s41437-020-0348-2 Patterson et al. (2006) PLoS Genetics. DOI: 10.1371/journal.pgen.0020190 Thia (2022) Mol. Ecol. DOI: 10.1111/1755-0998.13706

Examples

library(genomalicious)

data(data_Genos)

# Test 1 to 10 with 3, 10, 20, and 40 PC axes, plotting just the first 10
# eigenvalues from the PCA, with a ggplot flavour.
inferK <- dapc_infer(
   data_Genos,
   kTest=1:10L,
   pTest=c(3,10,20,40),
   screeMax=10L,
   plotLook='ggplot'
)

# Tabulated statistics
inferK$tab

# The K-means clustering results for k=3 fitted with p=3 PC axes
inferK$fit$`k=3,p=3`

# The plot
inferK$plot

j-a-thia/genomalicious documentation built on April 13, 2025, 9:41 a.m.

j-a-thia/genomalicious index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

j-a-thia/genomalicious
A smorgasbord of R functions for population genomic analyses

dapc_infer: Conduct an inference of _k_ prior to DAPC.
In j-a-thia/genomalicious: A smorgasbord of R functions for population genomic analyses

Conduct an inference of k prior to DAPC.

Description

Usage

Arguments

Details

Value

References

Examples

Related to dapc_infer in j-a-thia/genomalicious...

R Package Documentation

Browse R Packages

We want your feedback!

j-a-thia/genomalicious A smorgasbord of R functions for population genomic analyses

dapc_infer: Conduct an inference of _k_ prior to DAPC. In j-a-thia/genomalicious: A smorgasbord of R functions for population genomic analyses

Conduct an inference of k prior to DAPC.

Description

Usage

Arguments

Details

Value

References

Examples

Related to dapc_infer in j-a-thia/genomalicious...

R Package Documentation

Browse R Packages

We want your feedback!

j-a-thia/genomalicious
A smorgasbord of R functions for population genomic analyses

dapc_infer: Conduct an inference of _k_ prior to DAPC.
In j-a-thia/genomalicious: A smorgasbord of R functions for population genomic analyses