plot_pca | R Documentation |
Function produces several ggplot2 based plots to review principal components analysis (PCA).
With the submission of a dataframe with measurements and samples, the function calls stats::prcomp
to
perform a PCA of the data. As an example, a regression application could consist of a set of measurements for the columns
and samples for the rows. The function returns:
1. A pca class object returned from stats::prcomp()
2. A vector of component percentages.
3. A ggplot2 scatter plot object of samples across an x-y pair of principal components.
4. A ggplot2 circle plot object of the loadings or correlations of the measurements with an x-y pair of principal components.
5. A ggplot2 table of the loadings.
6. A ggplot2 object that assembles 3, 4, and 5 above into one figure.
plot_pca(
df = NULL,
measures = NULL,
center = FALSE,
scale. = FALSE,
tol = NULL,
rank. = NULL,
pca_pair = c("PC1", "PC2"),
pca_values = "loading",
aes_fill = NULL,
aes_label = NULL,
title = NULL,
subtitle = NULL,
x_limits = NULL,
x_major_breaks = waiver(),
y_limits = NULL,
y_major_breaks = waiver(),
pts_color = "black",
pts_fill = "white",
pts_alpha = 1,
pts_size = 1,
figure_width = 10,
header_font_sz = 9,
show_meas_table = TRUE
)
df |
The data frame containing rows of observations across columns of numeric measurements. |
measures |
A vector of column names from 'df' to used in the PCA. |
center |
A logical indicating whether the variables should be shifted to zero centered. |
scale. |
A logical indicating whether the variables should be scaled to have unit variance before the analysis takes place. |
tol |
A value indicating the magnitude below which components should be omitted. Components are omitted if their standard deviations are less than or equal to 'tol' times the standard deviation of the first component. |
rank. |
A number specifying the maximal rank, i.e. maximal number of principal components to be used. If NULL then the length of the 'measures' argument. |
pca_pair |
A string vector that names the pair of components of interest. Acceptable values are "PC1", "PC2", "PC3", ... |
pca_values |
A string that sets the type of PCA values to display. Acceptable values are "loading" or "correlation". |
aes_fill |
A string that sets the variable name from 'df' for the aesthetic mapping for fill. |
aes_label |
A string that sets the variable name from 'df' for the aesthetic mapping for labeling observations. |
title |
A string that sets the plot title. |
subtitle |
A string that sets the plot subtitle. |
x_limits |
Depending on the class of 'measures', a numeric/Date/POSIXct 2 element vector that sets the minimum
and maximum for the x axis. Use |
x_major_breaks |
Depending on the class of 'measures', a numeric/Date/POSIXct vector or function that defines the exact major tic locations along the x axis. |
y_limits |
A numeric 2 element vector that sets the minimum and maximum for the y axis.
Use |
y_major_breaks |
A numeric vector or function that defines the exact major tic locations along the y axis. |
pts_color |
A string that sets the color of the points. |
pts_fill |
A string that sets the fill color of the points. |
pts_alpha |
A numeric value that sets the alpha level of 'pts_fill'. |
pts_size |
A numeric value that sets the size of the points. |
figure_width |
An numeric that sets the width of the overall figure in inches. |
header_font_sz |
A numeric that defines the font size (in pixels) of table's headers. |
show_meas_table |
A logical that if |
Returning a named list with:
"pca" – A list object of of class prcomp
containing the results of the completed PCA.
"percent_var" – A numeric vector showing the percent of variance for each component.
"samp_plot" – A ggplot scatter plot object of samples across an x-y pair of principal components.
"loadings_plot" – A ggplot plot object of the loadings or correlations of the measurements with an x-y pair of principal components.
"loadings_table_plot" – A table showing the measurement loadings or correlations across all the principal components.
"figure_plot" – A multi-paneled ggplot object that assembles "samp_plot", "loadings_plot", and "loadings_table_plot" into one figure.
library(ggplot2)
library(gtable)
library(ggplotify)
library(RplotterPkg)
library(RregressPkg)
measurements <- colnames(RregressPkg::pilots)[2:7]
pilots_pca_lst <- RregressPkg::plot_pca(
df = RregressPkg::pilots,
measures = measurements,
center = TRUE,
scale. = TRUE,
rank. = 4,
aes_fill = "Group",
pts_size = 2,
x_limits = c(-4, 2),
x_major_breaks = seq(-4, 2, 1),
title = "Principal Components of Pilots and Apprentices",
subtitle = "6 tested attributes from 20 pilots and 20 apprentices"
)
pca <- pilots_pca_lst$pca
pca_percent <- pilots_pca_lst$percent_var
figure_plot <- pilots_pca_lst$figure_plot
pca
summary(pca)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.