ggbiplot: Biplots following the grammar of graphics

View source: R/biplot.r

ggbiplotR Documentation

Biplots following the grammar of graphics

Description

Build a biplot visualization from ordination data wrapped as a tbl_ord object.

Usage

ggbiplot(
  ordination = NULL,
  mapping = aes(x = 1, y = 2),
  axis.type = "interpolative",
  xlim = NULL,
  ylim = NULL,
  expand = TRUE,
  clip = "on",
  axis.percents = TRUE,
  sec.axes = NULL,
  scale.factor = NULL,
  scale_rows = NULL,
  scale_cols = NULL,
  ...
)

ord_aes(ordination, ...)

Arguments

ordination

A tbl_ord.

mapping

List of default aesthetic mappings to use for the biplot. The default assigns the first two coordinates to the aesthetics x and y. Other assignments must be supplied in each layer added to the plot.

axis.type

Character, partially matched; whether to build an "interpolative" (the default) or a "predictive" biplot. The latter requires that x and y are mapped to shared coordinates, that no other shared coordinates are mapped to, and inertia is conferred entirely onto one matrix factor. NB: This option is only implemented for linear techniques (ED, SVD, & PCA).

xlim, ylim

Limits for the x and y axes.

expand

If TRUE, the default, adds a small expansion factor to the limits to ensure that data and axes don't overlap. If FALSE, limits are taken exactly from the data or xlim/ylim.

clip

Should drawing be clipped to the extent of the plot panel? A setting of "on" (the default) means yes, and a setting of "off" means no. In most cases, the default of "on" should not be changed, as setting clip = "off" can cause unexpected results. It allows drawing of data points anywhere on the plot, including in the plot margins. If limits are set via xlim and ylim and some data points fall outside those limits, then those data points may show up in places such as the axes, the legend, the plot title, or the plot margins.

axis.percents

Whether to concatenate default axis labels with inertia percentages.

sec.axes

Matrix factor character to specify a secondary set of axes.

scale.factor

Numeric value used to scale the secondary axes against the primary axes; ignored if sec.axes is not specified.

scale_rows, scale_cols

Either the character name of a numeric variable in get_*(ordination) or a numeric vector of length nrow(get_*(ordination)), used to scale the coordinates of the matrix factors.

...

Additional arguments passed to ggplot2::fortify(); see fortify.tbl_ord().

Details

ggbiplot() produces a ggplot object from a tbl_ord object ordination. The baseline object is the default unadorned "ggplot"-class object p with the following differences from what ggplot2::ggplot() returns:

  • p$mapping is augmented with .matrix = .matrix, which expects either .matrix = "rows" or .matrix = "cols" from the biplot.

  • p$coordinates is defaulted to ggplot2::coord_equal() in order to faithfully render the geometry of an ordination. The optional parameters xlim, ylim, expand, and clip are passed to coord_equal() and default to its ggplot2 defaults.

  • When x or y are mapped to coordinates of ordination, and if axis.percents is TRUE, p$labels$x or p$labels$y are defaulted to the coordinate names concatenated with the percentages of inertia captured by the coordinates.

  • p is assigned the class "ggbiplot" in addition to "ggplot". This serves no functional purpose currently.

Furthermore, the user may feed single integer values to the x and y aesthetics, which will be interpreted as the corresponding coordinates in the ordination. Currently only 2-dimensional biplots are supported, so both x and y must take coordinate values.

ord_aes() is a convenience function that generates a full-rank set of coordinate aesthetics ..coord1, ..coord2, etc. mapped to the shared coordinates of the ordination object, along with any additional aesthetics that are processed internally by ggplot2::aes().

The axis.type parameter controls whether the biplot is interpolative or predictive, though predictive biplots are still experimental and limited to linear methods like PCA. Gower & Hand (1996) and Gower, Gardner–Lubbe, & le Roux (2011) thoroughly explain the construction and interpretation of predictive biplots.

Value

A ggplot object.

Biplot layers

ggbiplot() uses ggplot2::fortify() internally to produce a single data frame with a .matrix column distinguishing the subjects ("rows") and variables ("cols"). The stat layers stat_rows() and stat_cols() simply filter the data frame to one of these two.

The geom layers geom_rows_*() and geom_cols_*() call the corresponding stat in order to render plot elements for the corresponding factor matrix. geom_dims_*() selects a default matrix based on common practice, e.g. points for rows and arrows for columns.

References

Gower JC & Hand DJ (1996) Biplots. Chapman & Hall, ISBN: 0-412-71630-5.

Gower JC, Gardner–Lubbe S, & le Roux NJ (2011) Understanding Biplots. Wiley, ISBN: 978-0-470-01255-0. https://www.wiley.com/go/biplots

See Also

ggplot2::ggplot2(), on which ggbiplot() is built

Examples

# compute PCA of Anderson iris measurements
iris[, -5] %>%
  princomp(cor = TRUE) %>%
  as_tbl_ord() %>%
  confer_inertia(1) %>%
  mutate_rows(species = iris$Species) %>%
  mutate_cols(measure = gsub("\\.", " ", tolower(names(iris)[-5]))) %>%
  print() -> iris_pca

# row-principal biplot with rescaled secondary axis
iris_pca %>%
  ggbiplot(aes(color = species), sec.axes = "cols", scale.factor = 2) +
  theme_bw() +
  scale_color_brewer(type = "qual", palette = 2) +
  geom_rows_point() +
  geom_cols_vector(color = "#444444") +
  geom_cols_text_radiate(aes(label = measure), color = "#444444") +
  ggtitle(
    "Row-principal PCA biplot of Anderson iris measurements",
    "Variable loadings scaled to secondary axes"
  ) +
  expand_limits(y = c(-1, 3.5))
# Performance measures can be regressed on the artificial coordinates of
# ordinated vehicle specs. Because the ordination of specs ignores performance,
# these coordinates will probably not be highly predictive. The gradient of each
# performance measure along the artificial axes is visualized by projecting the
# regression coefficients onto the ordination biplot.

# scaled principal components analysis of vehicle specs
mtcars_specs_pca <- ordinate(
  mtcars, cols = c(cyl, disp, hp, drat, wt, vs, carb),
  model = ~ princomp(., cor = TRUE)
)
# data frame of vehicle performance measures
mtcars %>%
  subset(select = c(mpg, qsec)) %>%
  as.matrix() %>%
  print() -> mtcars_perf
# regress performance measures on principal components
lm(mtcars_perf ~ get_rows(mtcars_specs_pca)) %>%
  as_tbl_ord() %>%
  augment_ord() %>%
  print() -> mtcars_pca_lm
# regression biplot
ggbiplot(mtcars_specs_pca, aes(label = name),
         sec.axes = "rows", scale.factor = .5) +
  theme_minimal() +
  geom_rows_text(size = 3) +
  geom_cols_vector(data = mtcars_pca_lm) +
  geom_cols_text_radiate(data = mtcars_pca_lm) +
  expand_limits(x = c(-2.5, 2))

# multidimensional scaling based on a scaled cosine distance of vehicle specs
cosine_dist <- function(x) {
  x <- as.matrix(x)
  num <- x %*% t(x)
  denom_rt <- as.matrix(rowSums(x^2))
  denom <- sqrt(denom_rt %*% t(denom_rt))
  as.dist(1 - num / denom)
}
mtcars %>%
  subset(select = c(cyl, disp, hp, drat, wt, vs, carb)) %>%
  scale() %>%
  cosine_dist() %>%
  cmdscale() %>%
  as.data.frame() ->
  mtcars_specs_cmds
# names must be consistent with `cmdscale_ord()` below
names(mtcars_specs_cmds) <- c("PCo1", "PCo2")
# regress performance measures on principal coordinates
lm(mtcars_perf ~ as.matrix(mtcars_specs_cmds)) %>%
  as_tbl_ord() %>%
  augment_ord() %>%
  print() -> mtcars_cmds_lm
# multidimensional scaling using `cmdscale_ord()`
mtcars %>%
  subset(select = c(cyl, disp, hp, drat, wt, vs, carb)) %>%
  scale() %>%
  cosine_dist() %>%
  cmdscale_ord() %>%
  as_tbl_ord() %>%
  augment_ord() %>%
  print() -> mtcars_specs_cmds_ord
# regression biplot
ggbiplot(mtcars_specs_cmds_ord, aes(label = name),
         sec.axes = "rows", scale.factor = 3) +
  theme_minimal() +
  geom_rows_text(size = 3) +
  geom_cols_vector(data = mtcars_cmds_lm) +
  geom_cols_text_radiate(data = mtcars_cmds_lm) +
  expand_limits(x = c(-2.25, 1.25), y = c(-2, 1.5))
# PCA of iris data
iris_pca <- ordinate(iris, cols = 1:4, prcomp, scale = TRUE)

# row-principal predictive biplot
iris_pca %>%
  augment_ord() %>%
  ggbiplot(axis.type = "predictive") +
  theme_bw() +
  scale_color_brewer(type = "qual", palette = 2) +
  geom_cols_axis(aes(label = name, center = center, scale = scale)) +
  geom_rows_point(aes(color = Species), alpha = .5) +
  ggtitle("Predictive biplot of Anderson iris measurements")

ordr documentation built on Oct. 21, 2022, 1:07 a.m.