Description Usage Arguments Details Value Examples
pca_biplot
makes a beautiful biplot of your data!
1 2 3 4 5 6 7 8 9 10 | pca_biplot(data, data_cols = 1:ncol(data), center_data = TRUE,
scale_data = FALSE, data_projection = "pc_scores",
variable_projection = "loadings", chart_title = FALSE,
xaxis_pc = 1, yaxis_pc = 2, limits_nudge_x = 0,
limits_nudge_y = 0, points = TRUE, point_color = NULL,
point_labels = FALSE, point_label_size = 3.5,
point_labels_nudge_y = 0.5, arrows = TRUE, arrow_labels = FALSE,
arrow_labels_nudge_y = 0.5, arrow_label_size = 3.5,
arrow_legend = FALSE, arrow_scaling_factor = 1,
use_ggrepel = FALSE)
|
data |
A matrix or data.frame of your data. Rows will be points, columns will be vectors. |
data_cols |
A vector of columns to treat as data for the PCA. |
center_data |
This is an important step for PCA! Just leave it as |
scale_data |
Do you want to scale the data before doing PCA? |
data_projection |
How do you want to project your data? Options include "pc_scores" (projections of data into PC space aka principal components) and "pc_scores_scaled" (the same scores, but scaled to unit variance). |
variable_projection |
How do you want to project your variables? Options include "loadings" (the variable loadings) and "axes" (the principal axes). |
chart_title |
Title of the chart. |
xaxis_pc |
Principal component to map to the x-axis. |
yaxis_pc |
Principal component to map to the y-axis. |
limits_nudge_x |
Add this value to both ends of the x-axis (can be negative!). If the original axis would run from -5 to 5, and you use a value of 1 for this parameter, then the axis will be shown running from -6 to 6 instead. If you use a -2 here, then the axis will run from -3 to 3 instead of -5 to 5. |
limits_nudge_y |
Add this value to both ends of the y-axis (can be negative!). See limits_nudge_x for more info. |
points |
Do you want to draw the points? |
point_color |
Column name (as a string) to color the points by. |
point_labels |
Do you want to label the points? |
point_label_size |
How big do you want the labels? |
point_labels_nudge_y |
Use this param to specify the amount to nudge the point label away from the point in the y direction. |
arrows |
Do you want to draw the arrows? |
arrow_labels |
Do you want to label the arrows? |
arrow_labels_nudge_y |
How big do you want labels? |
arrow_label_size |
Use this param to specify the amount to nudge the arrow label away from the tip of the arrow in the y direction. |
arrow_legend |
Put a snazzy legend to the side instead of labeling individual arrows. |
arrow_scaling_factor |
Arrows are automatically scaled to fit on the chart based on the data projection. If you want the arrows longer, pass a value > 1 here. If you want them to be shorter, pass a value > 0 but < 1. If you want it to decide automatically, leave it as is (or pass 1). |
use_ggrepel |
Set this to TRUE if you want to let ggrepel decide where the labels should go. If you use this option the label nudge options will be ignored. |
The data_cols
parameter is used to subset the data.frame
or matrix
passed in to the data
parameter. So, if you passed in 1:4
, then the PCA would be calculated using the first four columns of data
. If you passed in c(2:4, 6)
, then the PCA would be performed on columns 2, 3, 4, and 6 of data
.
The scale_data
parameter is used to scale the data before running PCA. You might want to scale your data if the magnitude of your predictor variables is highly variable, otherwise variables whose magnitude is much larger than the rest of the variables will likely dominate other variables. On the other hand, this might be what you want. It's up to you! (Note that you can not use scale if one of your variables is constant or zero.)
The point_color
option is used to color the points by some column of the data
. It is similar to the color
parameter of the aes
function in ggplot
, except that it must be provided as a string/character rather than a Symbol. E.g., if you want to color the points by the Species
column, use point_color = "Species"
rather than point_color = Species
. If point_color
is also present in the columns specified by data_cols
, it will be removed from the data_cols
.
The use_ggrepel
parameter is used to enable ggrepel
to decide where the labels should be automatically. This works really well for the points (unless there are a lot of them in which case it is SLOW), but not as well for the arrows. It technically treats the tip of the arrow as the point to avoid, so it will often collide with the line part of the arrow.
The return value is a list
with attributes biplot
and pca
.
The biplot
attribute contains the ggplot object. It's just like anyother ggplot object, so, depending on how you originally called the pca_biplot
function, you could customize it more to your liking (see examples).
The pca
attribute contains all the PCA data wrapped in a list
. This contains all you need to make whichever kind of biplots you like or to create your own custom biplot chart. It's attributes are:
svals: singular values
lsvecs: left singular vectors
rsvec: right singular vectors
pc_variance: variance explained by the included PCs
cum_var: cumulative variance explained
pc_scores: matrix with the PC scores
pc_scores_scaled: matrix with the PC scores scaled to unit variance
variable_loadings: matrix with the variable loadings
The plot_elem
attribute contains a list with a bunch of things to help the user customize the plot.
A list with attributes biplot
, pca
, and plot_elem
. (See Details for more info.)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 | # Import needed libraries.
library(ggplot2)
library(ggrepel)
library(grid)
library(gridExtra)
library(biplotr)
#### Here is a simple example of making a biplot using the iris dataset included in R.
chart <- pca_biplot(iris,
# Show the arrows
arrows = TRUE,
# Print the arrow labels
arrow_labels = TRUE,
# Sometimes the arrow labels are too close or too far from the arrow tips.
# This moves the arrow labels out from the tips by 0.3 units in up or
# down in the y-axis direction.
arrow_labels_nudge_y = 0.3,
# Color the points by the Species column.
point_color = "Species",
# Increase the x-axis limits a bit so the arrow labels don't get cut off.
limits_nudge_x = 1,
# Decrease the y-axis limits a bit as by default, it makes a square chart,
# but this data is not too spread out in the y-axis.
limits_nudge_y = -1)
# Show the plot!
print(chart$biplot)
#### Here is an example showing that the biplot attr of the return list is just
#### an ordinary ggplot object.
chart <- pca_biplot(iris, data_cols = 1:4, points = FALSE, arrows = FALSE)
# Print a chart with purple points.
print(chart$biplot + geom_point(color = "#aa2288"))
#### This example shows that you could use the pca attr of the return list to make your own plot.
chart <- pca_biplot(iris, data_cols = 1:4)
plot(chart$pca$pc_scores)
#### This example shows that you must specify data_cols if your input data has lots of extra
#### variables not meant to be in the PCA. (See the Vignettes for more info.)
iris2 <- cbind(
iris,
Group = c(
rep(1, times = nrow(iris) / 2),
rep(2, times = nrow(iris) / 2)
)
)
chart <- pca_biplot(iris2, data_cols = 1:4, point_color = "Species")
#### This example shows how to use the items in the plot_elem return attr to build up your own
#### custom plots.
# First do the PCA
p <- pca_biplot(iris, point_color = "Species")
# This will print the normal biplot.
print(p$biplot)
# And this is more or less a recreation of the biplot contained in p$biplot using the items
# in p$plot_elem.
p$plot_elem$biplot_chart_base +
biplotr::theme_amelia() +
geom_point(aes(color = Species)) +
geom_segment(data = p$plot_elem$loadings_df,
mapping = aes(x = x, y = y, xend = xend, yend = yend),
arrow = arrow(length = unit(0.25, "cm")))
#### This example shows the different types of biplots you can make.
loadings_scores <- pca_biplot(data = iris,
# Color points by "Species" column.
point_color = "Species",
arrow_labels = TRUE,
data_projection = "pc_scores",
variable_projection = "loadings")
axes_scores <- pca_biplot(data = iris,
# Color points by "Species" column.
point_color = "Species",
arrow_labels = TRUE,
data_projection = "pc_scores",
variable_projection = "axes")
loadings_scores_scaled <- pca_biplot(data = iris,
# Color points by "Species" column.
point_color = "Species",
arrow_labels = TRUE,
data_projection = "pc_scores_scaled",
variable_projection = "loadings")
axes_scores_scaled <- pca_biplot(data = iris,
# Color points by "Species" column.
point_color = "Species",
arrow_labels = TRUE,
data_projection = "pc_scores_scaled",
variable_projection = "axes")
# Note that the ggplot object is in the $biplot attribute.
grid.arrange(loadings_scores$biplot, axes_scores$biplot,
loadings_scores_scaled$biplot, axes_scores_scaled$biplot,
nrow = 2,
ncol = 2)
#### Here is an example of making a biplot from the included team_shooting_mat dataset.
#### It includes examples for customizing the look of the chart.
# pca_biplot returns a ggplot object
chart <- pca_biplot(
# The data matrix
data = team_shooting_mat,
# Add a custom title
chart_title = "NBA Team Shooting 2018",
# Increase the x axis limits by 1 in + and - directions
limits_nudge_x = 1,
# Center the data (important for PCA)
center_data = TRUE,
# Scale the data (since some of our variables have much higher magnitude than others)
scale_data = TRUE,
# Show point labels
point_labels = TRUE,
# Push the labels 0.35 units on the y axis
point_labels_nudge_y = 0.35,
# Show arrow labels
arrow_labels = TRUE,
# Push arrow labels away from arrow heads by 0.35 units
arrow_labels_nudge_y = 0.35
)
# Draw the plot
grid.arrange(chart$biplot)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.