pca_biplot: Make the snazziest of biplots!

Description Usage Arguments Details Value Examples

View source: R/biplot.R

Description

pca_biplot makes a beautiful biplot of your data!

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
pca_biplot(data, data_cols = 1:ncol(data), center_data = TRUE,
  scale_data = FALSE, data_projection = "pc_scores",
  variable_projection = "loadings", chart_title = FALSE,
  xaxis_pc = 1, yaxis_pc = 2, limits_nudge_x = 0,
  limits_nudge_y = 0, points = TRUE, point_color = NULL,
  point_labels = FALSE, point_label_size = 3.5,
  point_labels_nudge_y = 0.5, arrows = TRUE, arrow_labels = FALSE,
  arrow_labels_nudge_y = 0.5, arrow_label_size = 3.5,
  arrow_legend = FALSE, arrow_scaling_factor = 1,
  use_ggrepel = FALSE)

Arguments

data

A matrix or data.frame of your data. Rows will be points, columns will be vectors.

data_cols

A vector of columns to treat as data for the PCA.

center_data

This is an important step for PCA! Just leave it as TRUE unless you really know what you're doing.

scale_data

Do you want to scale the data before doing PCA?

data_projection

How do you want to project your data? Options include "pc_scores" (projections of data into PC space aka principal components) and "pc_scores_scaled" (the same scores, but scaled to unit variance).

variable_projection

How do you want to project your variables? Options include "loadings" (the variable loadings) and "axes" (the principal axes).

chart_title

Title of the chart.

xaxis_pc

Principal component to map to the x-axis.

yaxis_pc

Principal component to map to the y-axis.

limits_nudge_x

Add this value to both ends of the x-axis (can be negative!). If the original axis would run from -5 to 5, and you use a value of 1 for this parameter, then the axis will be shown running from -6 to 6 instead. If you use a -2 here, then the axis will run from -3 to 3 instead of -5 to 5.

limits_nudge_y

Add this value to both ends of the y-axis (can be negative!). See limits_nudge_x for more info.

points

Do you want to draw the points?

point_color

Column name (as a string) to color the points by.

point_labels

Do you want to label the points?

point_label_size

How big do you want the labels?

point_labels_nudge_y

Use this param to specify the amount to nudge the point label away from the point in the y direction.

arrows

Do you want to draw the arrows?

arrow_labels

Do you want to label the arrows?

arrow_labels_nudge_y

How big do you want labels?

arrow_label_size

Use this param to specify the amount to nudge the arrow label away from the tip of the arrow in the y direction.

arrow_legend

Put a snazzy legend to the side instead of labeling individual arrows.

arrow_scaling_factor

Arrows are automatically scaled to fit on the chart based on the data projection. If you want the arrows longer, pass a value > 1 here. If you want them to be shorter, pass a value > 0 but < 1. If you want it to decide automatically, leave it as is (or pass 1).

use_ggrepel

Set this to TRUE if you want to let ggrepel decide where the labels should go. If you use this option the label nudge options will be ignored.

Details

The data_cols parameter is used to subset the data.frame or matrix passed in to the data parameter. So, if you passed in 1:4, then the PCA would be calculated using the first four columns of data. If you passed in c(2:4, 6), then the PCA would be performed on columns 2, 3, 4, and 6 of data.

The scale_data parameter is used to scale the data before running PCA. You might want to scale your data if the magnitude of your predictor variables is highly variable, otherwise variables whose magnitude is much larger than the rest of the variables will likely dominate other variables. On the other hand, this might be what you want. It's up to you! (Note that you can not use scale if one of your variables is constant or zero.)

The point_color option is used to color the points by some column of the data. It is similar to the color parameter of the aes function in ggplot, except that it must be provided as a string/character rather than a Symbol. E.g., if you want to color the points by the Species column, use point_color = "Species" rather than point_color = Species. If point_color is also present in the columns specified by data_cols, it will be removed from the data_cols.

The use_ggrepel parameter is used to enable ggrepel to decide where the labels should be automatically. This works really well for the points (unless there are a lot of them in which case it is SLOW), but not as well for the arrows. It technically treats the tip of the arrow as the point to avoid, so it will often collide with the line part of the arrow.

The return value is a list with attributes biplot and pca.

The biplot attribute contains the ggplot object. It's just like anyother ggplot object, so, depending on how you originally called the pca_biplot function, you could customize it more to your liking (see examples).

The pca attribute contains all the PCA data wrapped in a list. This contains all you need to make whichever kind of biplots you like or to create your own custom biplot chart. It's attributes are:

The plot_elem attribute contains a list with a bunch of things to help the user customize the plot.

Value

A list with attributes biplot, pca, and plot_elem. (See Details for more info.)

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
# Import needed libraries.
library(ggplot2)
library(ggrepel)
library(grid)
library(gridExtra)
library(biplotr)

#### Here is a simple example of making a biplot using the iris dataset included in R.

chart <- pca_biplot(iris,
                    # Show the arrows
                    arrows = TRUE,

                    # Print the arrow labels
                    arrow_labels = TRUE,

                    # Sometimes the arrow labels are too close or too far from the arrow tips.
                    # This moves the arrow labels out from the tips by 0.3 units in up or
                    # down in the y-axis direction.
                    arrow_labels_nudge_y = 0.3,

                    # Color the points by the Species column.
                    point_color = "Species",

                    # Increase the x-axis limits a bit so the arrow labels don't get cut off.
                    limits_nudge_x = 1,

                    # Decrease the y-axis limits a bit as by default, it makes a square chart,
                    # but this data is not too spread out in the y-axis.
                    limits_nudge_y = -1)

# Show the plot!
print(chart$biplot)

#### Here is an example showing that the biplot attr of the return list is just
#### an ordinary ggplot object.

chart <- pca_biplot(iris, data_cols = 1:4, points = FALSE, arrows = FALSE)

# Print a chart with purple points.
print(chart$biplot + geom_point(color = "#aa2288"))

#### This example shows that you could use the pca attr of the return list to make your own plot.

chart <- pca_biplot(iris, data_cols = 1:4)

plot(chart$pca$pc_scores)

#### This example shows that you must specify data_cols if your input data has lots of extra
#### variables not meant to be in the PCA. (See the Vignettes for more info.)

iris2 <- cbind(
  iris,
  Group = c(
    rep(1, times = nrow(iris) / 2),
    rep(2, times = nrow(iris) / 2)
  )
)

chart <- pca_biplot(iris2, data_cols = 1:4, point_color = "Species")

#### This example shows how to use the items in the plot_elem return attr to build up your own
#### custom plots.

# First do the PCA
p <- pca_biplot(iris, point_color = "Species")

# This will print the normal biplot.
print(p$biplot)

# And this is more or less a recreation of the biplot contained in p$biplot using the items
# in p$plot_elem.
p$plot_elem$biplot_chart_base +
  biplotr::theme_amelia() +
  geom_point(aes(color = Species)) +
  geom_segment(data = p$plot_elem$loadings_df,
               mapping = aes(x = x, y = y, xend = xend, yend = yend),
               arrow = arrow(length = unit(0.25, "cm")))


#### This example shows the different types of biplots you can make.

loadings_scores <- pca_biplot(data = iris,
                              # Color points by "Species" column.
                              point_color = "Species",
                              arrow_labels = TRUE,
                              data_projection = "pc_scores",
                              variable_projection = "loadings")

axes_scores <- pca_biplot(data = iris,
                          # Color points by "Species" column.
                          point_color = "Species",
                          arrow_labels = TRUE,
                          data_projection = "pc_scores",
                          variable_projection = "axes")

loadings_scores_scaled <- pca_biplot(data = iris,
                                     # Color points by "Species" column.
                                     point_color = "Species",
                                     arrow_labels = TRUE,
                                     data_projection = "pc_scores_scaled",
                                     variable_projection = "loadings")

axes_scores_scaled <- pca_biplot(data = iris,
                                 # Color points by "Species" column.
                                 point_color = "Species",
                                 arrow_labels = TRUE,
                                 data_projection = "pc_scores_scaled",
                                 variable_projection = "axes")

# Note that the ggplot object is in the $biplot attribute.
grid.arrange(loadings_scores$biplot, axes_scores$biplot,
             loadings_scores_scaled$biplot, axes_scores_scaled$biplot,
             nrow = 2,
             ncol = 2)

#### Here is an example of making a biplot from the included team_shooting_mat dataset.
#### It includes examples for customizing the look of the chart.

# pca_biplot returns a ggplot object
chart <- pca_biplot(
  # The data matrix
  data = team_shooting_mat,

  # Add a custom title
  chart_title = "NBA Team Shooting 2018",

  # Increase the x axis limits by 1 in + and - directions
  limits_nudge_x = 1,

  # Center the data (important for PCA)
  center_data = TRUE,
  # Scale the data (since some of our variables have much higher magnitude than others)
  scale_data = TRUE,

  # Show point labels
  point_labels = TRUE,
  # Push the labels 0.35 units on the y axis
  point_labels_nudge_y = 0.35,

  # Show arrow labels
  arrow_labels = TRUE,
  # Push arrow labels away from arrow heads by 0.35 units
  arrow_labels_nudge_y = 0.35
)

# Draw the plot
grid.arrange(chart$biplot)

mooreryan/biplotr documentation built on Sept. 2, 2020, 8:32 a.m.