plot_pca: plot_pca

View source: R/plot_pca.R

plot_pcaR Documentation

plot_pca

Description

Function produces several ggplot2 based plots to review principal components analysis (PCA).

With the submission of a dataframe with measurements and samples, the function calls stats::prcomp to perform a PCA of the data. As an example, a regression application could consist of a set of measurements for the columns and samples for the rows. The function returns:

1. A pca class object returned from stats::prcomp()

2. A vector of component percentages.

3. A ggplot2 scatter plot object of samples across an x-y pair of principal components.

4. A ggplot2 circle plot object of the loadings or correlations of the measurements with an x-y pair of principal components.

5. A ggplot2 table of the loadings.

6. A ggplot2 object that assembles 3, 4, and 5 above into one figure.

Usage

plot_pca(
  df = NULL,
  measures = NULL,
  center = FALSE,
  scale. = FALSE,
  tol = NULL,
  rank. = NULL,
  pca_pair = c("PC1", "PC2"),
  pca_values = "loading",
  aes_fill = NULL,
  aes_label = NULL,
  title = NULL,
  subtitle = NULL,
  x_limits = NULL,
  x_major_breaks = waiver(),
  y_limits = NULL,
  y_major_breaks = waiver(),
  pts_color = "black",
  pts_fill = "white",
  pts_alpha = 1,
  pts_size = 1,
  figure_width = 10,
  header_font_sz = 9,
  show_meas_table = TRUE
)

Arguments

df

The data frame containing rows of observations across columns of numeric measurements.

measures

A vector of column names from 'df' to used in the PCA.

center

A logical indicating whether the variables should be shifted to zero centered.

scale.

A logical indicating whether the variables should be scaled to have unit variance before the analysis takes place.

tol

A value indicating the magnitude below which components should be omitted. Components are omitted if their standard deviations are less than or equal to 'tol' times the standard deviation of the first component.

rank.

A number specifying the maximal rank, i.e. maximal number of principal components to be used. If NULL then the length of the 'measures' argument.

pca_pair

A string vector that names the pair of components of interest. Acceptable values are "PC1", "PC2", "PC3", ...

pca_values

A string that sets the type of PCA values to display. Acceptable values are "loading" or "correlation".

aes_fill

A string that sets the variable name from 'df' for the aesthetic mapping for fill.

aes_label

A string that sets the variable name from 'df' for the aesthetic mapping for labeling observations.

title

A string that sets the plot title.

subtitle

A string that sets the plot subtitle.

x_limits

Depending on the class of 'measures', a numeric/Date/POSIXct 2 element vector that sets the minimum and maximum for the x axis. Use NA to refer to the existing minimum and maximum.

x_major_breaks

Depending on the class of 'measures', a numeric/Date/POSIXct vector or function that defines the exact major tic locations along the x axis.

y_limits

A numeric 2 element vector that sets the minimum and maximum for the y axis. Use NA to refer to the existing minimum and maximum.

y_major_breaks

A numeric vector or function that defines the exact major tic locations along the y axis.

pts_color

A string that sets the color of the points.

pts_fill

A string that sets the fill color of the points.

pts_alpha

A numeric value that sets the alpha level of 'pts_fill'.

pts_size

A numeric value that sets the size of the points.

figure_width

An numeric that sets the width of the overall figure in inches.

header_font_sz

A numeric that defines the font size (in pixels) of table's headers.

show_meas_table

A logical that if TRUE will display the table of loadings/correlations.

Value

Returning a named list with:

  1. "pca" – A list object of of class prcomp containing the results of the completed PCA.

  2. "percent_var" – A numeric vector showing the percent of variance for each component.

  3. "samp_plot" – A ggplot scatter plot object of samples across an x-y pair of principal components.

  4. "loadings_plot" – A ggplot plot object of the loadings or correlations of the measurements with an x-y pair of principal components.

  5. "loadings_table_plot" – A table showing the measurement loadings or correlations across all the principal components.

  6. "figure_plot" – A multi-paneled ggplot object that assembles "samp_plot", "loadings_plot", and "loadings_table_plot" into one figure.

Examples

library(ggplot2)
library(gtable)
library(ggplotify)
library(RplotterPkg)
library(RregressPkg)

measurements <- colnames(RregressPkg::pilots)[2:7]
pilots_pca_lst <- RregressPkg::plot_pca(
  df = RregressPkg::pilots,
  measures = measurements,
  center = TRUE,
  scale. = TRUE,
  rank. = 4,
  aes_fill = "Group",
  pts_size = 2,
  x_limits = c(-4, 2),
  x_major_breaks = seq(-4, 2, 1),
  title = "Principal Components of Pilots and Apprentices",
  subtitle = "6 tested attributes from 20 pilots and 20 apprentices"
)
pca <- pilots_pca_lst$pca
pca_percent <- pilots_pca_lst$percent_var
figure_plot <- pilots_pca_lst$figure_plot

pca

summary(pca)


deandevl/RregressPkg documentation built on Feb. 5, 2025, 12:11 p.m.