ClusterDEG: Normalize, scale, and regress out wanted variation

Description Usage Arguments Details Value Author(s) Examples

View source: R/clusterdeg.R

Description

ClusterDEG runs SCTransform on a Seurat object, followed by RunPCA, RunTSNE, RunUMAP, and clustering. Also finds marker genes for each cluster and saves the output as a table along with a heatmap of the top 10 upregulated genes in each cluster.

Usage

1
2
3
4
ClusterDEG(scrna, outdir = ".", npcs = 30, res = 0.8, mnn = FALSE,
  skip.sct = FALSE, min.dist = 0.3, n.neighbors = 30,
  regress = NULL, ccpca = FALSE, test = "wilcox",
  logfc.thresh = 0.25, min.pct = 0.1)

Arguments

scrna

Seurat object.

outdir

Path to output directory.

npcs

Number of principle components to use for UMAP and clustering.

res

Numeric value denoting resolution to use for clustering. Higher values generally mean fewer clusters. Values of 0.5-3 are sensible. Multiple values may be entered as a vector - resulting clusters will be added as a meta.data column named Cluster_res_npcs where res and npcs will be the values for those arguments, respectively. The clusters derived from the last value in the list will be set as the default Ident for cells and stored in meta.data under 'seurat_clusters' in addition to the aforementioned format.

mnn

Boolean indicating whether scrna was integrated with method="MNN" via SimpleIntegration. If so, must be set to TRUE or unintegrated PCA embeddings will be used for dimensionality reduction and clustering.

skip.sct

Boolean indicating whether to skip SCTransform. Set to TRUE if SimpleIntegration was used to integrate the Seurat object.

min.dist

Number that controls how tighly the embedding is allowed to compress points together in RunUMAP. Increasing may be beneficial for large datasets.

n.neighbors

Integer that determines the number of neighboring points used in local approximations of manifold structure in RunUMAP. Values of 5-50 are considered sensical. Larger values preserve more global structure while detailed local structure is lost.

regress

Character vector of meta.data variables to regress during data scaling.

ccpca

Boolean to indicate whether PCA using only cell cycle genes should be done. If so, it will saved as a reduction named "cc". This is useful to compare to prior PCAs using the cell cycle genes if cell cycle scores were regressed out via regress.

test

String indication which DE test to use for marker finding. Options are: "wilcox", "bimod", "roc", "t", "negbinom", "poisson", "LR", "MAST", "DESeq2". See FindAllMarkers.

logfc.thresh

Value that limits DE testing to genes that show, on average, at least X-fold difference (log-scale) between two groups of cells. Increasing speeds up function at cost of potentially missing weaker differences.

min.pct

Value that limits DE testing to genes detected in a minimum fraction of cells in either population.

Details

If multiple res values are given, a table and heatmap will be made for each, along with saving the clusters for each in their own meta.data columns.

Heatmaps created by ClusterDEG have each identity class downsampled to a max of 100 cells - this makes smaller clusters much more visible.

Value

A Seurat object with normalized, scaled counts and assigned clusters. If ccpca = TRUE, an additional PCA reduction named "cc" will also be present.

Author(s)

Jared Andrews

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
## Not run: 
library(Seurat)
scrna <- ClusterDEG(pbmc_small)

## End(Not run)

## Not run: 
# Multiple clustering resolutions
scrna <- ClusterDEG(pbmc_small, res = c(0.8, 1, 1.2))

## End(Not run)

j-andrews7/EZscRNA documentation built on Feb. 24, 2020, 10:37 a.m.