run.pca | R Documentation |
Method to run a PCA dimensionality reduction algorithm. A principal component analysis (PCA) is capable of reducing the number of dimensions (i.e. parameters) with minimal effect on the variation of the given dataset. This function will run a PCA calculation (extremely fast) and generate plots (takes time). For individuals (such as samples or patients), a PCA can group them based on their similarities. A PCA is also capable of ranking variables/parameters (such as markers or cell counts) based on their contribution to the variability across a dataset in an extremely fast manner. In cytometry, this can be useful to identify marker(s) that can be used to differentiate between subset(s) of cells. Uses the base R package "stats" for PCA, "factoextra" for PCA and scree plots, "data.table" for saving .csv files, "ggplot2" for saving plots, "gtools" for rearranging data order, 'RColorBrewer' and 'viridis' for colour schemes. More information on PCA plots can be found here http://www.sthda.com/english/articles/31-principal-component-methods-in-r-practical-guide/118-principal-component-analysis-in-r-prcomp-vs-princomp/.
run.pca(dat, use.cols, scale = TRUE, add.pca.col = FALSE,
pca.col.no = 50, pca.lite = FALSE, scree.plot = TRUE, comp.no = 2,
variable.contribution = TRUE, plot.individuals = TRUE,
plot.ind.label = "point", pointsize.ind = 1.5, row.names = NULL,
plot.ind.group = FALSE, group.ind = NULL, colour.group = "viridis",
pointsize.group = 1.5, ellipse.type = "confidence",
ellipse.level = 0.95, mean.point = TRUE,
randomise.order = TRUE, order.seed = 42,
plot.variables = TRUE, colour.var = "solid",
plot.combined = TRUE, repel = FALSE, var.numb = 20, path = getwd())
dat |
NO DEFAULT. data.frame. |
use.cols |
NO DEFAULT. Vector of numbers, reflecting the columns to use for dimensionality reduction (may not want parameters such as "Time" or "Sample"). |
scale |
DEFAULT = TRUE. A logical value indicating whether the variables should be scaled to have unit variance before the analysis takes place. |
add.pca.col |
DEFAULT = FALSE. Option to add PC coordinates to input data. |
pca.col.no |
DEFULAT = 50. Number of PC to be added to input data. |
pca.lite |
DEFAULT = FALSE. Will stop running the function after PCA coordinates have been added to input data (assuming add.pca.col = TRUE). |
scree.plot |
DEFAULT = TRUE. Option to create scree plots. Will save generated scree plot. Note this will require the input of an elbow point during run if comp.no = NULL. |
comp.no |
DEFAULT = 2. Select number of components to be saved. If NULL, user will be asked during run to select number based on scree plot. |
variable.contribution |
DEFAULT = TRUE. Option to create plot showing the contribution of each variable. Horizontal red line represents the average variable contribution if all variables contributed equally. Requires scree.plot = TRUE. |
plot.individuals |
DEFAULT = TRUE. Option to create PCA plots on individuals (samples/patients). |
plot.ind.label |
DEFAULT = "point". Option to add text to PCA plots on individuals as an extra identifier. Use c("point", "text") to include both text and point. |
pointsize.ind |
DEFAULT = 1.5. Numeric. Size of dots of individuals on PCA plot. |
row.names |
DEFAULT = NULL. Column (as character) that defines individuals. Will be used to place name on plot.individuals. |
plot.ind.group |
DEFAULT = FALSE. Option to group individuals with ellipses (which by default show the 95 % confidence interval). Must specify column that groups individuals with group.ind. |
group.ind |
DEFAULT = NULL. Column (as character) that defines groups of individuals. Works with plot.ind.group which must be set to TRUE. |
colour.group |
DEFAULT = "viridis". Colour scheme for each group. Options include "jet", "spectral", "viridis", "inferno", "magma". |
pointsize.group |
DEFAULT = 1.5. Numeric. Size of shapes of group individuals on PCA plot. |
ellipse.type |
DEFAULT = "confidence". Set type of ellipse. Options include "confidence", "convex", "concentration", "t", "norm", "euclid". See factoextra::fviz for more information. |
ellipse.level |
DEFAULT = 0.95. Size of ellipses. By default 95 % (0.95). |
mean.point |
DEFAULT = TRUE. Option to plot the mean on PCA plot with different groups. |
randomise.order |
DEFAULT = TRUE. Option to randomise plotting order of individuals to control for overlap. |
order.seed |
DEFAULT = 42. Set the seed for randomising plotting order of individuals. |
plot.variables |
DEFAULT = TRUE. Option to create PCA plots on variables (markers/cell counts). |
colour.var |
DEFAULT = "solid". Colour scheme for PCA plot with variables. Options include "solid", "jet", "spectral", "viridis", "inferno", "magma", "BuPu". Note some colours are pale and may not appear clearly on plot. |
plot.combined |
DEFAULT = TRUE. Option to create a combined PCA plot with both individuals and variables. |
repel |
DEFAULT = FALSE. Option to avoid overlapping text in PCA plots. Can greatly increase plot time if there is a large number of samples. |
var.numb |
DEFAULT = 20. Top number of variables to be plotted. Note the greater the number, the longer plots will take. |
path |
DEFAULT = getwd(). The location to save plots. By default, will save to current working directory. Can be overidden. |
Felix Marsh-Wakefield, felix.marsh-wakefield@sydney.edu.au
# Set directory to save files. By default it will save files at get()
# Run PCA on demonstration dataset, adding PC to dataset
dat <- Spectre::demo.clustered
# Run PCA on demonstration dataset
Spectre::run.pca(dat = Spectre::demo.clustered,
use.cols = c(11:19),
repel = TRUE
)
# Compare between groups
## Not run:
Spectre::run.pca(dat = Spectre::demo.clustered,
use.cols = c(11:19),
comp.no = NULL,
plot.ind.label = c("point", "text"), #individual cells will be labelled as numbers
plot.ind.group = TRUE,
group.ind = "Group",
mean.point = FALSE,
randomise.order = TRUE
)
## End(Not run)
# When prompted, type in "5" and click enter to continue function
# (this selects the elbow point based off the scree plot)
## Possible issues ##
# Remove any NA present
na.omit(dat)
# Remove columns that have zero variance (e.g. if MFI is the same for all
# samples for a marker)
dat <- data.table::as.data.table(dat)
dat <- dat[ , lapply(.SD, function(v) if(data.table::uniqueN(v, na.rm = TRUE) > 1) v)]
# Ellipses are only generated in 'plot.ind.group' when there are at least
# 2 samples per group ('group.ind')
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.