PrepareData: Prepare scRNA-seq data for reclustering.

View source: R/PrepareData.R

PrepareDataR Documentation

Prepare scRNA-seq data for reclustering.

Description

This function prepares single cell data for reclustering analysis. The input is a Seurat object in any stage of pre-processing, or even a SingleCellExperiment object that will be converted to Seurat format. The function checks which metadata features (% mitochondrial DNA, cell cycle scores) and assays are present (normalized counts, PCA & t-SNE embeddings), then runs an initial graph-based clustering.

Usage

PrepareData(
  seurat.object = NULL,
  use.sct = FALSE,
  n.HVG = 4000,
  use.parallel = TRUE,
  n.cores = 3,
  regress.mt = FALSE,
  regress.cc = FALSE,
  n.PC = "auto",
  var.cutoff = 0.15,
  which.dim.reduc = c("umap"),
  perplexity = 30,
  umap.lr = 0.05,
  initial.resolution = 0.3,
  nn.metric = "cosine",
  k.val = NULL,
  do.plot = NULL,
  random.seed = 312
)

Arguments

seurat.object

The object containing the cells you'd like to analyze. Defaults to NULL.

use.sct

Should SCTransform be used for normalization / HVG selection? Defaults to FALSE, which equates to using typical log1p-normalization.

n.HVG

The number of highly variable genes to compute. Defaults to 4000.

use.parallel

Should the Seurat data reprocessing & the main reclustering loop be parallelized? Defaults to TRUE.

n.cores

The number of cores to be used in parallel computation is use.parallel is TRUE. Defaults to 3.

regress.mt

Should the percentage of mitochondrial DNA be computed and regressed out? Works for mouse / human gene names. Defaults to FALSE

regress.cc

Should cell cycle scores be computed & regressed out? NOTE: uses human cell cycle genes. Defaults to FALSE

n.PC

The number of PCs used as input to non-linear dimension reduction and clustering algorithms. Can be chosen by user, or set automatically using ChoosePCs. Defaults to "auto".

var.cutoff

(Optional) The proportion of variance explained cutoff to be used when n.PC is set to "auto". Defaults to .15.

which.dim.reduc

(Optional) Which non-linear dimension reduction algorithms should be used? Supports "tsne", "umap", "phate", and "all". Plots will be generated using the t-SNE embedding. Defaults to c("umap"), as most users will likely not have phateR installed.

perplexity

(Optional) What perplexity value should be used when embedding cells in t-SNE space? Defaults to 30.

umap.lr

(Optional) What learning rate should be used for the UMAP embedding? Defaults to 0.05.

initial.resolution

The initial resolution parameter used in the FindClusters function. Defaults to 0.3.

nn.metric

(Optional) The distance metric to be used in computing the SNN graph. Defaults to "cosine".

k.val

(Optional) The nearest-neighbors parameter k to be used when creating the shared nearest-neighbor graph with FindNeighbors. Defaults to k \approx \sqrt{n}.

do.plot

(Optional) The dimension reduction view you'd like plotted. Should be one of "tsne", "umap", "phate", or "pca". Defaults to NULL.

random.seed

The seed used to control stochasticity in several functions. Defaults to 312.

Value

A Seurat object.

Author(s)

Jack Leary

References

Stuart et al (2019). Comprehensive integration of single-cell data. Cell.

See Also

ChoosePCs

NormalizeData

FindVariableFeatures

SCTransform

FindNeighbors

FindClusters

Examples

## Not run: 
PrepareData(seurat.object,
            n.variable.genes = 3000,
            n.PC = 20,
            do.plot = TRUE)
PrepareData(seurat.object,
            use.parallel = TRUE,
            n.cores = 6,
            initial.resolution = .5,
            k.val = 25)

## End(Not run)

jr-leary7/SCISSORS documentation built on April 20, 2023, 8:21 p.m.