Principal Components Analysis

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7, 
  fig.height = 7,
  out.width = NULL
)
library(dimensio)

Do PCA

## Load data
data(iris)
head(iris)

## Compute PCA
## (non numeric variables are automatically removed)
X <- pca(iris, center = TRUE, scale = TRUE)

Explore the results

dimensio provides several methods to extract (get_*()) the results:

The package also allows to quickly visualize (viz_*()) the results:

## Get eigenvalues
get_eigenvalues(X)

## Scree plot
screeplot(X, cumulative = TRUE)

## Plot variable contributions to the definition of the first two axes
viz_contributions(X, margin = 2, axes = c(1, 2))

PCA biplot

A biplot is the simultaneous representation of rows and columns of a rectangular dataset. It is the generalization of a scatterplot to the case of mutlivariate data: it allows to visualize as much information as possible in a single graph [@greenacre2010].

dimensio allows to display two types of biplots: a form biplot (row-metric-preserving biplot) or a covariance biplot (column-metric-preserving biplot). See @greenacre2010 for more details about biplots.

The form biplot favors the representation of the individuals: the distance between the individuals approximates the Euclidean distance between rows. In the form biplot the length of a vector approximates the quality of the representation of the variable.

biplot(X, type = "form", label = "variables")

The covariance biplot favors the representation of the variables: the length of a vector approximates the standard deviation of the variable and the cosine of the angle formed by two vectors approximates the correlation between the two variables [@greenacre2010]. In the covariance biplot the distance between the individuals approximates the Mahalanobis distance between rows.

biplot(X, type = "covariance", label = "variables")

Biplots have the drawbacks of their advantages: they can quickly become difficult to read as they display a lot of information at once. It may then be preferable to visualize the results for individuals and variables separately.

Plot PCA loadings

viz_variables() depicts the variables by rays emanating from the origin (both their lengths and directions are important to the interpretation).

## Plot variables factor map
viz_variables(X)

viz_variables() allows to highlight additional information by varying different graphical elements (color, transparency, shape and size of symbols...).

## Highlight cos2
viz_variables(
  x = X, 
  highlight = "cos2", 
  col = khroma::color("YlOrBr")(4, range = c(0.5, 1)),
  legend = list(x = "bottomleft")
)

Plot PCA scores

viz_individuals() allows to display individuals and to highlight additional information.

## Plot individuals and color by species
viz_individuals(
  x = X,
  highlight = iris$Species,
  col = khroma::color("bright")(3), # Custom color scale
  pch = c(15, 16, 17), # Custom symbols
  legend = list(x = "bottomright")
)
## Add ellipses
viz_individuals(x = X)
viz_tolerance(x = X, group = iris$Species, level = 0.95,
              border = khroma::color("high contrast")(3))

## Add convex hull
viz_individuals(x = X)
viz_hull(x = X, group = iris$Species, level = 0.95,
         border = khroma::color("high contrast")(3))
## Highlight petal length
viz_individuals(
  x = X, 
  highlight = iris$Petal.Length,
  col = khroma::color("YlOrBr")(12), # Custom color scale
  cex = c(1, 2), # Custom size scale
  pch = 16,
  legend = list(x = "bottomleft")
)
## Highlight contributions
viz_individuals(
  x = X, 
  highlight = "contrib",
  col = khroma::color("iridescent")(12), # Custom color scale
  cex = c(1, 2), # Custom size scale
  pch = 16,
  legend = list(x = "bottomleft")
)

Custom plot

If you need more flexibility, the get_*() family and the tidy() and augment() functions allow you to extract the results as data frames and thus build custom graphs with base graphics or ggplot2.

iris_tidy <- tidy(X, margin = 2)
head(iris_tidy)

iris_augment <- augment(X, margin = 1)
head(iris_augment)
## Custom plot with ggplot2
ggplot2::ggplot(data = iris_augment) +
  ggplot2::aes(x = F1, y = F2, colour = contribution) +
  ggplot2::geom_vline(xintercept = 0, linewidth = 0.5, linetype = "dashed") +
  ggplot2::geom_hline(yintercept = 0, linewidth = 0.5, linetype = "dashed") +
  ggplot2::geom_point() +
  ggplot2::coord_fixed() + # /!\
  ggplot2::theme_bw() +
  khroma::scale_color_iridescent()

References



Try the dimensio package in your browser

Any scripts or data that you put into this service are public.

dimensio documentation built on Nov. 25, 2023, 1:08 a.m.