run.pca: Run the PCA algorithm (using stats::prcomp)

Description Usage Arguments Author(s) Examples

View source: R/run.pca.R

Description

Method to run a PCA dimensionality reduction algorithm. A principal component analysis (PCA) is capable of reducing the number of dimensions (i.e. parameters) with minimal effect on the variation of the given dataset. This function will run a PCA calculation (extremely fast) and generate plots (takes time). For individuals (such as samples or patients), a PCA can group them based on their similarities. A PCA is also capable of ranking variables/parameters (such as markers or cell counts) based on their contribution to the variability across a dataset in an extremely fast manner. In cytometry, this can be useful to identify marker(s) that can be used to differentiate between subset(s) of cells. Uses the base R package "stats" for PCA, "factoextra" for PCA and scree plots, "data.table" for saving .csv files, "ggplot2" for saving plots, "gtools" for rearranging data order, 'RColorBrewer' and 'viridis' for colour schemes. More information on PCA plots can be found here http://www.sthda.com/english/articles/31-principal-component-methods-in-r-practical-guide/118-principal-component-analysis-in-r-prcomp-vs-princomp/.

Usage

1
run.pca(dat, use.cols, scale = TRUE, add.pca.col = FALSE, pca.col.no = 50, pca.lite = FALSE, scree.plot = TRUE, variable.contribution = TRUE, plot.individuals = TRUE, plot.ind.label = "point", pointsize.ind = 1.5, row.names = NULL, plot.ind.group = FALSE, group.ind = NULL, colour.group = "viridis", pointsize.group = 1.5, ellipse.type = "confidence", ellipse.level = 0.95, mean.point = TRUE, randomise.order = TRUE, order.seed = 42, plot.variables = TRUE, colour.var = "solid", plot.combined = TRUE, repel = FALSE, var.numb = 20, path = getwd())

Arguments

dat

NO DEFAULT. data.frame.

use.cols

NO DEFAULT. Vector of numbers, reflecting the columns to use for dimensionality reduction (may not want parameters such as "Time" or "Sample").

scale

DEFAULT = TRUE. A logical value indicating whether the variables should be scaled to have unit variance before the analysis takes place.

add.pca.col

DEFAULT = FALSE. Option to add PC coordinates to input data.

pca.col.no

DEFULAT = 50. Number of PC to be added to input data.

pca.lite

DEFAULT = FALSE. Will stop running the function after PCA coordinates have been added to input data (assuming add.pca.col = TRUE).

scree.plot

DEFAULT = TRUE. Option to create scree plots. Note this will require the input of an elbow point during run. Will save generated scree plot.

variable.contribution

DEFAULT = TRUE. Option to create plot showing the contribution of each variable. Horizontal red line represents the average variable contribution if all variables contributed equally. Requires scree.plot = TRUE.

plot.individuals

DEFAULT = TRUE. Option to create PCA plots on individuals (samples/patients).

plot.ind.label

DEFAULT = "point". Option to add text to PCA plots on individuals as an extra identifier. Use c("point", "text") to include both text and point.

pointsize.ind

DEFAULT = 1.5. Numeric. Size of dots of individuals on PCA plot.

row.names

DEFAULT = NULL. Column (as character) that defines individuals. Will be used to place name on plot.individuals.

plot.ind.group

DEFAULT = FALSE. Option to group individuals with ellipses (which by default show the 95 % confidence interval). Must specify column that groups individuals with group.ind.

group.ind

DEFAULT = NULL. Column (as character) that defines groups of individuals. Works with plot.ind.group which must be set to TRUE.

colour.group

DEFAULT = "viridis". Colour scheme for each group. Options include "jet", "spectral", "viridis", "inferno", "magma".

pointsize.group

DEFAULT = 1.5. Numeric. Size of shapes of group individuals on PCA plot.

ellipse.type

DEFAULT = "confidence". Set type of ellipse. Options include "confidence", "convex", "concentration", "t", "norm", "euclid". See factoextra::fviz for more information.

ellipse.level

DEFAULT = 0.95. Size of ellipses. By default 95 % (0.95).

mean.point

DEFAULT = TRUE. Option to plot the mean on PCA plot with different groups.

randomise.order

DEFAULT = TRUE. Option to randomise plotting order of individuals to control for overlap.

order.seed

DEFAULT = 42. Set the seed for randomising plotting order of individuals.

plot.variables

DEFAULT = TRUE. Option to create PCA plots on variables (markers/cell counts).

colour.var

DEFAULT = "solid". Colour scheme for PCA plot with variables. Options include "solid", "jet", "spectral", "viridis", "inferno", "magma", "BuPu". Note some colours are pale and may not appear clearly on plot.

plot.combined

DEFAULT = TRUE. Option to create a combined PCA plot with both individuals and variables.

repel

DEFAULT = FALSE. Option to avoid overlapping text in PCA plots. Can greatly increase plot time if there is a large number of samples.

var.numb

DEFAULT = 20. Top number of variables to be plotted. Note the greater the number, the longer plots will take.

path

DEFAULT = getwd(). The location to save plots. By default, will save to current working directory. Can be overidden.

Author(s)

Felix Marsh-Wakefield, felix.marsh-wakefield@sydney.edu.au

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
# Set directory to save files. By default it will save files at get()
setwd("/Users/felixmarsh-wakefield/Desktop")

# Run PCA on demonstration dataset, adding PC to dataset
dat <- Spectre::demo.clustered
dat <- test.pca(dat = dat,
                        use.cols = c(11:19),
                        add.pca.col = TRUE,
                        pca.lite = TRUE
                        )
                        
# Run PCA on demonstration dataset
Spectre::run.pca(dat = Spectre::demo.clustered,
                use.cols = c(11:19),
                repel = TRUE
                )

# Compare between groups
Spectre::run.pca(dat = Spectre::demo.clustered,
                 use.cols = c(11:19),
                 plot.ind.label = c("point", "text"), #individual cells will be labelled as numbers
                 plot.ind.group = TRUE,
                 group.ind = "Group",
                 mean.point = FALSE,
                 randomise.order = TRUE
                 )
        
# When prompted, type in "5" and click enter to continue function (this selects the elbow point based off the scree plot)

## Possible issues ##
# Remove any NA present
na.omit(dat)

# Remove columns that have zero variance (e.g. if MFI is the same for all samples for a marker)
dat <- data.table::as.data.table(dat)
dat <- dat[ , lapply(.SD, function(v) if(data.table::uniqueN(v, na.rm = TRUE) > 1) v)] #for data table format

# Ellipses are only generated in 'plot.ind.group' when there are at least 2 samples per group ('group.ind')

sydneycytometry/Spectre documentation built on March 20, 2021, 2:15 a.m.