pca_genos: Perform a PCA (principal components analysis) on individual...

View source: R/pca_genos.R

pca_genosR Documentation

Perform a PCA (principal components analysis) on individual genotypes

Description

Takes a long-format data table of genotypes and conducts a PCA using R's prcomp() function. Different options for scaling the genotypes pre-PCA are available.

Usage

pca_genos(
  dat,
  scaling = "covar",
  sampCol = "SAMPLE",
  locusCol = "LOCUS",
  genoCol = "GT",
  popCol = NULL
)

Arguments

dat

Data table: A long-fomate data table. Genotypes can be coded as '/' separated characters (e.g. '0/0', '0/1', '1/1'), or integers of Alt allele counts (e.g. 0, 1, 2). Must contain the following columns,

  1. The sampled individuals (see param sampCol).

  2. The locus ID (see param locusCol).

  3. The genotype column (see param genoCol).

Optionally, a population ID column can also be included (see param popCol).

scaling

Character: How should the data (loci) be scaled? Default is 'covar' to scale to mean = 0, but variance is not adjusted, i.e. PCA on a covariance matrix. Set to 'corr' to scale to mean = 0 and variance = 1, i.e. PCA on a correlation matrix. Set to 'patterson' to use the Patteron et al. (2006) normalisation. Set to 'none' to if you do not want to do any scaling before PCA.

sampCol

Character: The column name with the sampled individual information. Default is 'SAMPLE'.

locusCol

Character: The column name with the locus information. Default is 'LOCUS'.

genoCol

Character: The column name with the genotype information. Default is 'GT'.

popCol

Character: An optional argument. The column name with the population information. Default is NULL. If specified, population membership is stored in the returned object.

Value

Returns a prcomp object. If argument popCols was specified, and additional index of $pops is also also present.

References

Patterson et al. (2006) Population structure and eigenanalysis. PLOS Genetics.

Examples

library(genomalicious)

# Data
data(data_Genos)
data_Genos

# Conduct the PCA with Patterson et al.'s (2006) normalisation, and
# population specified
pca <- pca_genos(dat=data_Genos, scaling='patterson', popCol='POP')

# Plot the PCA
pca_plot(pca)


j-a-thia/genomalicious documentation built on Oct. 19, 2024, 7:51 p.m.