PrepareData: Prepare scRNA-seq data for reclustering.
In jr-leary7/SCISSORS: Identify cell subpopulations in single cell RNA-seq data

PrepareData

R Documentation

Prepare scRNA-seq data for reclustering.

Description

This function prepares single cell data for reclustering analysis. The input is a Seurat object in any stage of pre-processing, or even a SingleCellExperiment object that will be converted to Seurat format. The function checks which metadata features (% mitochondrial DNA, cell cycle scores) and assays are present (normalized counts, PCA & t-SNE embeddings), then runs an initial graph-based clustering.

Usage

PrepareData(
  seurat.object = NULL,
  use.sct = FALSE,
  n.HVG = 4000,
  use.parallel = TRUE,
  n.cores = 3,
  regress.mt = FALSE,
  regress.cc = FALSE,
  n.PC = "auto",
  var.cutoff = 0.15,
  which.dim.reduc = c("umap"),
  perplexity = 30,
  umap.lr = 0.05,
  initial.resolution = 0.3,
  nn.metric = "cosine",
  k.val = NULL,
  do.plot = NULL,
  random.seed = 312
)

Arguments

`seurat.object`	The object containing the cells you'd like to analyze. Defaults to NULL.
`use.sct`	Should `SCTransform` be used for normalization / HVG selection? Defaults to FALSE, which equates to using typical log1p-normalization.
`n.HVG`	The number of highly variable genes to compute. Defaults to 4000.
`use.parallel`	Should the `Seurat` data reprocessing & the main reclustering loop be parallelized? Defaults to TRUE.
`n.cores`	The number of cores to be used in parallel computation is `use.parallel` is TRUE. Defaults to 3.
`regress.mt`	Should the percentage of mitochondrial DNA be computed and regressed out? Works for mouse / human gene names. Defaults to FALSE
`regress.cc`	Should cell cycle scores be computed & regressed out? NOTE: uses human cell cycle genes. Defaults to FALSE
`n.PC`	The number of PCs used as input to non-linear dimension reduction and clustering algorithms. Can be chosen by user, or set automatically using `ChoosePCs`. Defaults to "auto".
`var.cutoff`	(Optional) The proportion of variance explained cutoff to be used when n.PC is set to "auto". Defaults to .15.
`which.dim.reduc`	(Optional) Which non-linear dimension reduction algorithms should be used? Supports "tsne", "umap", "phate", and "all". Plots will be generated using the t-SNE embedding. Defaults to c("umap"), as most users will likely not have `phateR` installed.
`perplexity`	(Optional) What perplexity value should be used when embedding cells in t-SNE space? Defaults to 30.
`umap.lr`	(Optional) What learning rate should be used for the UMAP embedding? Defaults to 0.05.
`initial.resolution`	The initial resolution parameter used in the `FindClusters` function. Defaults to 0.3.
`nn.metric`	(Optional) The distance metric to be used in computing the SNN graph. Defaults to "cosine".
`k.val`	(Optional) The nearest-neighbors parameter k to be used when creating the shared nearest-neighbor graph with `FindNeighbors`. Defaults to `k \approx \sqrt{n}`.
`do.plot`	(Optional) The dimension reduction view you'd like plotted. Should be one of "tsne", "umap", "phate", or "pca". Defaults to NULL.
`random.seed`	The seed used to control stochasticity in several functions. Defaults to 312.

Value

A Seurat object.

Author(s)

Jack Leary

References

Stuart et al (2019). Comprehensive integration of single-cell data. Cell.

Examples

## Not run: 
PrepareData(seurat.object,
            n.variable.genes = 3000,
            n.PC = 20,
            do.plot = TRUE)
PrepareData(seurat.object,
            use.parallel = TRUE,
            n.cores = 6,
            initial.resolution = .5,
            k.val = 25)

## End(Not run)

jr-leary7/SCISSORS documentation built on April 20, 2023, 8:21 p.m.