pca: Principal component analysis

View source: R/stats.R

pcaR Documentation

Principal component analysis

Description

Given a data matrix X with N rows and P columns, principal component analysis can be performed using the singular value decomposition (SVD), X = U D V^T, where U is NxN, D is NxN and diagonal (singular values), and V is PxN. Another way to perform it, is to first compute a symmetric matrix, S (e.g. the scatter matrix X X^T, but not necessarily), and then to use the eigendecomposition (EVD) of it, S = Q Delta Q^-1, where Q is NxN and Delta is NxN and diagonal (eigenvalues). TODO: for large matrices, use the RSpectra package which allows to calculate only the k largest singular values and corresponding singular vectors.

Usage

pca(
  X = NULL,
  S = NULL,
  ct = TRUE,
  sc = FALSE,
  plot = NULL,
  main = "PCA",
  cols = NULL,
  pchs = NULL,
  ES10 = FALSE
)

Arguments

X

data matrix with N rows ("units") and P columns ("variables"); P can be equal to N but X shouldn't be symmetric; a data frame will be converted into a matrix; specify X or S, but not both

S

symmetric matrix with N rows and columns; a data frame will be converted into a matrix; specify X or S, but not both

ct

if TRUE, the columns of X will be centered (recommended); a good reason to center the data matrix for PCA is given in Miranda et al (2008)

sc

if TRUE, the columns of X will be scaled/standardized (if different units)

plot

if not NULL, use "points" to show a plot with points of PC1 versus PC2, and "text" to use text with row names of X as labels (use plotPca to use other axes)

main

main title of the plot

cols

N-vector of colors (will be "black" by default)

pchs

N-vector of point symbols; used if plot="points"; will be 20 by default

ES10

if TRUE (and X is specified), the Lambda (= U) and F (= D V^T) matrices from Engelhart and Stephens (2010) are also returned

Value

list with (1) if X is given, the rotated data matrix (= X V) which rows correspond to the original rows after translation towards the sample mean (if center=TRUE) and rotation onto the "principal components" (eigenvectors of the sample covariance matrix), (2) if X is given, the singular values, (3) the eigen values, and (4) the proportions of variance explained per PC

Author(s)

Timothee Flutre

See Also

plotPca

Examples

## Not run: ## simulate genotypes from 3 populations
set.seed(1859)
genomes <- simulCoalescent(nb.inds=300, nb.pops=3, mig.rate=3)
X <- genomes$genos
table(inds.per.pop <- kmeans(X, 3)$cluster)
A <- estimGenRel(X)
imageWithScale(A, main="Additive genetic relationships") # we clearly see 3 clusters

## prcomp() uses svd()
out.prcomp <- prcomp(x=X, retx=TRUE, center=TRUE, scale.=FALSE)
summary(out.prcomp)$importance[,1:4]
out.prcomp$sdev[1:4]
(out.prcomp$sdev^2 / sum(out.prcomp$sdev^2))[1:4]
head(out.prcomp$rotation[, 1:4]) # first four PCs (i.e. eigenvectors)
head(out.prcomp$x[, 1:4]) # rotated data (= data x rotation matrix)

## princomp() uses eigen() and requires more units than variables
out.princomp <- princomp(x=X)

## this function fed with the data matrix
out.pca.X <- pca(X=X, ct=TRUE, sc=FALSE)
out.pca.X$sgl.values[1:4]
out.pca.X$eigen.values[1:4]
out.pca.X$prop.vars[1:4]
head(out.pca.X$rot.dat[, 1:4]) # rotated data

## this function fed with the scatter matrix
S <- tcrossprod(scale(X, center=TRUE, scale=FALSE))
out.pca.S <- pca(S=S)
out.pca.S$eigen.values[1:4]
out.pca.S$prop.vars[1:4]
head(out.pca.S$rot.dat[, 1:4]) # rotated data

## End(Not run)

timflutre/rutilstimflutre documentation built on Feb. 7, 2024, 8:17 a.m.