pca | R Documentation |
Given a data matrix X with N rows and P columns, principal component analysis can be performed using the singular value decomposition (SVD), X = U D V^T, where U is NxN, D is NxN and diagonal (singular values), and V is PxN. Another way to perform it, is to first compute a symmetric matrix, S (e.g. the scatter matrix X X^T, but not necessarily), and then to use the eigendecomposition (EVD) of it, S = Q Delta Q^-1, where Q is NxN and Delta is NxN and diagonal (eigenvalues). TODO: for large matrices, use the RSpectra package which allows to calculate only the k largest singular values and corresponding singular vectors.
pca(
X = NULL,
S = NULL,
ct = TRUE,
sc = FALSE,
plot = NULL,
main = "PCA",
cols = NULL,
pchs = NULL,
ES10 = FALSE
)
X |
data matrix with N rows ("units") and P columns ("variables"); P can be equal to N but X shouldn't be symmetric; a data frame will be converted into a matrix; specify X or S, but not both |
S |
symmetric matrix with N rows and columns; a data frame will be converted into a matrix; specify X or S, but not both |
ct |
if TRUE, the columns of X will be centered (recommended); a good reason to center the data matrix for PCA is given in Miranda et al (2008) |
sc |
if TRUE, the columns of X will be scaled/standardized (if different units) |
plot |
if not NULL, use "points" to show a plot with |
main |
main title of the plot |
cols |
N-vector of colors (will be |
pchs |
N-vector of point symbols; used if |
ES10 |
if TRUE (and X is specified), the Lambda (= U) and F (= D V^T) matrices from Engelhart and Stephens (2010) are also returned |
list with (1) if X is given, the rotated data matrix (= X V) which rows correspond to the original rows after translation towards the sample mean (if center=TRUE) and rotation onto the "principal components" (eigenvectors of the sample covariance matrix), (2) if X is given, the singular values, (3) the eigen values, and (4) the proportions of variance explained per PC
Timothee Flutre
plotPca
## Not run: ## simulate genotypes from 3 populations
set.seed(1859)
genomes <- simulCoalescent(nb.inds=300, nb.pops=3, mig.rate=3)
X <- genomes$genos
table(inds.per.pop <- kmeans(X, 3)$cluster)
A <- estimGenRel(X)
imageWithScale(A, main="Additive genetic relationships") # we clearly see 3 clusters
## prcomp() uses svd()
out.prcomp <- prcomp(x=X, retx=TRUE, center=TRUE, scale.=FALSE)
summary(out.prcomp)$importance[,1:4]
out.prcomp$sdev[1:4]
(out.prcomp$sdev^2 / sum(out.prcomp$sdev^2))[1:4]
head(out.prcomp$rotation[, 1:4]) # first four PCs (i.e. eigenvectors)
head(out.prcomp$x[, 1:4]) # rotated data (= data x rotation matrix)
## princomp() uses eigen() and requires more units than variables
out.princomp <- princomp(x=X)
## this function fed with the data matrix
out.pca.X <- pca(X=X, ct=TRUE, sc=FALSE)
out.pca.X$sgl.values[1:4]
out.pca.X$eigen.values[1:4]
out.pca.X$prop.vars[1:4]
head(out.pca.X$rot.dat[, 1:4]) # rotated data
## this function fed with the scatter matrix
S <- tcrossprod(scale(X, center=TRUE, scale=FALSE))
out.pca.S <- pca(S=S)
out.pca.S$eigen.values[1:4]
out.pca.S$prop.vars[1:4]
head(out.pca.S$rot.dat[, 1:4]) # rotated data
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.