pca_tidiers: Tidying methods for a Principal Component Analysis

Description Usage Arguments Details Value See Also Examples

Description

Extract diagnostics, coordinates on the principal components (i.e. rows scores and columns loadings) and some fit statistics from a Principal Component Analysis.

Usage

1
2
3
tidy(x, ...)

augment(x, data=NULL, dimensions=c(1,2), which="row", scaling=which, ...)

Arguments

x

an object returned by a function performing Principal Component Analysis.

...

ignored.

data

the original dataset, to be concatenated with the output when extracting row scores. When NULL (the default) data will be extracted from the PCA object when it contains it (i.e. for all functions but prcomp).

dimensions

vector giving the indexes of the principal components to extract. Typically two are extracted to create a plot.

which

the type of coordinates in the new space to extract: either "rows", "lines", "observations", "objects", "individuals", "sites" (which are all treated as synonyms) or "columns", "variables", "descriptors", "species" (which are, again, synonyms). All can be abbreviated. By default, coordinates of rows are returned. Row coordinates are commonly called 'scores' and column coordinates usually called 'loadings'.

scaling

scaling for the scores. Can be

"none" (or 0)

for raw scores,

"rows" (or 1, or a synonym of "rows")

to scale row scores by the eigenvalues,

"columns" (or 2, or a synonym of "columns")

to scale column scores by the eigenvalues,

"both" (or 3)

to scale both row and column scores.

By default, scaling is adapted to the type of scores extracted (scaling 1 for row scores, scaling 2 for column scores, and scaling 3 when scores are extracted for a biplot).

Details

Scaling of scores follows the conventions of package FactoMineR. In summary, scaling 0 yields unscaled scores, in scaling 1, row scores are multiplied by

sqrt(n * eig)

where n is the number of active rows in the ordination and eig are the eigenvalues. In scaling 2, column scores are multiplied by

sqrt(eig)

In scaling 3 both rows and columns are scaled.

Value

For tidy, a data.frame containing the variance (i.e. eigenvalue), the proportion of variance, and the cumulative proportion of variance associated to each principal component.

For augment, a data.frame containing the original data (when which="rows" and data is supplied or can be extracted from the object) and the additional columns:

.rownames:

the identifier of the row or column, extracted from the row or column names in the original data.

.PC#:

the coordinates of data objects on the extracted principal components.

.cos2:

the squared cosine, summed over extracted PCs, which quantifies the quality of the representation of each data point in the space of the extracted PCs. NB: cos2 can only be computed when all possible principal components are extracted in the PCA objects; when it is not the case, cos2 is NA. In several packages, the number of principal components to keep is an argument of the PCA function (and the default is not "all").

.contrib:

the contribution of each object to the selected PCs. NB: same comment as for cos2 regarding the number of PCs kept in the PCA object.

.type:

the nature of the data extracted : row or col.

See Also

Functions to perform PCA: prcomp in package stats, PCA in package factoMineR, rda in package vegan, dudi.pca in package ade4, pca in package pcaMethods (on bioconductor).

Other PCA.related.functions: autoplot_pca, ca_tidiers

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
pca <- prcomp(USArrests, scale = TRUE)

tidy(pca)

head(augment(pca, which="row"))
head(augment(pca, which="col"))
# or use your preferred synonym, possibly abbreviated
head(augment(pca, which="obs"))
head(augment(pca, which="var"))
head(augment(pca, which="descriptors"))

# data is not contained in the `prcomp` object but can be provided
head(augment(pca, data=USArrests, which="row"))
# select different principal components
augment(pca, which="col", dim=c(2,3))

if (require("FactoMineR")) {
  pca <- FactoMineR::PCA(USArrests, graph=FALSE, ncp=4)
  head(augment(pca, which="individuals"))
  head(augment(pca, which="variables"))
}

if (require("vegan")) {
  pca <- vegan::rda(USArrests, scale=TRUE)
  # can use vegan's naming convention
  head(augment(pca, which="sites"))
  head(augment(pca, which="species"))
}

if (require("ade4")) {
  pca <- ade4::dudi.pca(USArrests, scannf=FALSE)
  head(augment(pca))
  head(augment(pca, which="variables"))
}

if (require("pcaMethods")) {
  pca <- pcaMethods::pca(USArrests, scale="uv")
  head(augment(pca))
  augment(pca, which="var")
}

jiho/autoplot documentation built on May 19, 2019, 9:29 a.m.