RunPCA: Principle component analysis

View source: R/Script_PLATE_08_PCA_0_RunPCA.R

RunPCAR Documentation

Principle component analysis

Description

Performs principle component analysis on splicing or gene data. This is a wrapper function for RunPCA.PSI and RunPCA.Exp.

Usage

RunPCA(
  MarvelObject,
  cell.group.column,
  cell.group.order = NULL,
  cell.group.colors = NULL,
  sample.ids = NULL,
  min.cells = 25,
  min.pct.events = NULL,
  features,
  point.size = 0.5,
  point.alpha = 0.75,
  point.stroke = 0.1,
  method.impute = "random",
  seed = 1,
  level,
  pcs = c(1, 2),
  mode = "pca",
  seed.umap = 42,
  npc.umap = 30,
  n.dim = 20,
  remove.outliers = FALSE,
  npc.elbow.plot = 50
)

Arguments

MarvelObject

Marvel object. S3 object generated from ComputePSI function.

cell.group.column

Character string. The name of the sample metadata column in which the variables will be used to label the cell groups on the PCA.

cell.group.order

Character string. The order of the variables under the sample metadata column specified in cell.group.column to appear in the PCA cell group legend.

cell.group.colors

Character string. Vector of colors for the cell groups specified for PCA analysis using cell.type.columns and cell.group.order. If not specified, default ggplot2 colors will be used.

sample.ids

Character strings. Specific cells to plot.

min.cells

Numeric value. The minimum no. of cells expressing the splicing event or gene for the event or gene, respectively, to be included for analysis.

features

Character string. Vector of tran_id or gene_id for analysis. Should match tran_id or gene_id column of MarvelObject$ValidatedSpliceFeature or MarvelObject$GeneFeature when level set to "splicing" or "gene", respectively.

point.size

Numeric value. Size of data points on reduced dimension space.

point.alpha

Numeric value. Transparency of the data points on reduced dimension space. Take any values between 0 to 1. The smaller the value, the more transparent the data points will be.

point.stroke

Numeric value. The thickness of the outline of the data points. The larger the value, the thicker the outline of the data points.

method.impute

Character string. Only applicable when level set to "splicing". Indicate the method for imputing missing PSI values (low coverage). "random" method randomly assigns any values between 0-1. "Bayesian" method uses the posterior PSI computed from the ComputePSI.Posterior function. Default is "random".

seed

Numeric value. Only applicable when level set to "splicing". Ensures imputed values for NA PSIs are reproducible when method.impute option set to "random". Default value is 1.

level

Character string. Indicate "splicing", "gene", or, "integrated" for splicing, gene expression analysis, or combined splicing and gene expression analysis, respectively. For "integrated", users should run both "splicing" and "gene" prior to running "integrated".

pcs

Numeric vector. The two principal components (PCs) to plot. Default is the first two PCs. If a vector of 3 is specified, a 3D scatterplot is returned.

mode

Character string. Specify "pca" for linear dimension reduction analysis or "umap" for non-linear dimension reduction analysis. Specify "elbow.plot" to return eigen values. Default is "pca".

seed.umap

Numeric value. Only applicable when mode set to "umap". To sure reproducibility of analysis. Default value is 42.

npc.umap

Numeric value. Only applicable when level set to "splicing" or "gene". Indicate the first number of principal components to use for UMAP . Default value is 30, i.e., the first 30 PCs.

n.dim

Numeric value. Only applicable when level set to "integrated". Indicate the first number of principal components to use for UMAP . Default value is 20, i.e., the first 20 PCs.

remove.outliers

Logical value. If set to TRUE, outliers will be removed. Outliers defined as data points beyond 1.5 times the interquartile range (IQR) from the 1st and 99th percentile. Default is FALSE.

npc.elbow.plot

Numeric value. Only applicable when mode set to "elbow.plot". Incidate the number of PCs to for elbow plot. Default value is 50.

min.events.pct

Numeric value. Only applicable when level set to "splicing". The minimum percentage of events expressed in a cell, above which, the cell will be retained for analysis. By default, this option is switched off, i.e., NULL.

Value

An object of class S3 with new slots MarvelObject$PCA$PSI$Results, MarvelObject$PCA$PSI$Plot, and MarvelObject$PCA$PSI$Plot.Elbow or MarvelObject$PCA$Exp$Results, MarvelObject$PCA$Exp$Plot, and MarvelObject$PCA$Exp$Plot.Elbow, when level option specified as "splicing" or "gene", respectively.

Examples

marvel.demo <- readRDS(system.file("extdata/data", "marvel.demo.rds", package="MARVEL"))

# Define splicing events for analysis
df <- do.call(rbind.data.frame, marvel.demo$PSI)
tran_ids <- df$tran_id

# PCA
marvel.demo <- RunPCA(MarvelObject=marvel.demo,
                      sample.ids=marvel.demo$SplicePheno$sample.id,
                      cell.group.column="cell.type",
                      cell.group.order=c("iPSC", "Endoderm"),
                      cell.group.colors=NULL,
                      min.cells=5,
                      features=tran_ids,
                      level="splicing",
                      point.size=2
                      )

# Check outputs
head(marvel.demo$PCA$PSI$Results$ind$coord)
marvel.demo$PCA$PSI$Plot

wenweixiong/MARVEL documentation built on Aug. 5, 2024, 2:54 p.m.