seuratpipeline: Basic SC Pipeline for QC and clustering

Description Usage Arguments Value Examples

View source: R/basicpipeline.R

Description

Basic SC pipeline for a single input dataset. Perform QC and clustering of an input dataset.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
seuratpipeline(
  data,
  format,
  transcript_gene_file,
  project = NULL,
  baselinefilter.mad = NULL,
  baseline.mito.filter = NULL,
  madmax.dist.percentmito.baseline = NULL,
  baseline.libsize.filter = NULL,
  madmax.dist.nCount_RNA.baseline = NULL,
  removemitomaxclust = NULL,
  iterativefilter = NULL,
  iterativefilter.libsize = NULL,
  iterativefilter.libsize.twosided = NULL,
  iterativefilter.libsize.lefttail = NULL,
  iterativefilter.mito = NULL,
  cellcycleregression = NULL,
  PCAgenelist = NULL,
  jackstraw = NULL,
  dims = NULL,
  res = NULL
)

Arguments

data

a string containing a filepath with format connoted by the format parameter, or a gene expression matrix (genes x cells) matrix, if format is 'mat'.

format

a string, either 'dir' for cellranger dir output; 'h5' for cellranger h5 output, 'kallisto' for the kallisto|bustools pipeline output, or 'mat' for gene expression matrix (genes x cells) format

transcript_gene_file

a string containing a filepath for the "transcript_gene" conversion file. Only used for Kallisto|bustools workflow.

baselinefilter.mad

T/F; whether to perform "global" QC, ie without pre-clustering. Default = False.

baseline.mito.filter

T/F; whether to perform global maximum mitochondrial content filtration using median absolute deviation threshold; will only work if baselinefilter.mad is set to True. Default is True.

madmax.dist.percentmito.baseline

a numeric, or a string reading 'predict'. If numeric is provided, will use this as median absolute deviation threshold for global mito cutoff. If set to 'predict', will attempt to learn cutoff from data. Default = 'predict'

baseline.libsize.filter

T/F; whether to perform global minimal lib size filtration using median absolute deviation threshold; will only work if baselinefilter.mad is set to True. Default is True.

madmax.dist.nCount_RNA.baseline

a numeric, or a string reading 'predict'. If numeric is provided, will use this as median absolute deviation threshold for global libsize cutoff. If set to 'predict', will attempt to learn cutoff from data. Default = 'predict'

removemitomaxclust

T/F ; whether to identify and remove abnormally high mitochondrial content clusters after first-pass clustering; if no baseline mito filtration is used, there will almost certainly be a mito-driven cluster. Identification is via Grubbs' test for outliers, based on Lukasz Komsta's implementation in the outliers package. Default is True.

iterativefilter

T/F ; whether to perform iterative filtering on first-pass filtering. Default is True.

iterativefilter.libsize

either one of two strings ('twosided' or 'lefttail') or False. Twosided will attempt to learn median abs. dev. cutoffs for both min and max to catch debris and, ostensibly, doublets. Lefttail will attempt to learn median abs. dev. threshold for min to catch debris; doublets may not accurately be captured by max cutoffs as this is more an artifact of sequencing than cell suspension. False will skip. Default is 'lefttail'.

iterativefilter.mito

T/F; whether to learn right-tail median abs. dev. thresholds and filter maximal mitochondrial content from each cluster. May incorrectly remove mito-okay cells while missing true mito-hi cells. Default = F.

cellcycleregression

either one of two strings ('total', 'difference'), or False. If false, still calculates cell cycle score but does not attempt correction. Default is False. See here: https://satijalab.org/seurat/v3.1/cell_cycle_vignette.html

PCAgenelist

a character vector of gene names to use for PCA. If null, defaults to highly variable genes called by SCT. default is NULL.

jackstraw

T/F; whether to perform jackstraw to score significant PCs for use in clustering / dimreduction. May be incompatible with SCT. Default is false.

dims

an integer range. Controls graph construction prior to clustering and dimensionality reduction for visualization. Connotes which PC dimensions to use in clustering. Defaults to 1:30.

res

a numeric, vector of numerics, or range of numerics. Controls assignment of cells to clusters. Connotes the "resolution" paramter used as correction in Louvain clustering. Defaults to c(0.5, 1.0, 1.5).

Value

will return a bunch of plots related to QC and an output in the form of a Seurat object to the standard out.

Examples

1
2
3
4
5
6
## Not run: 
pdf('qcplots.pdf')
sobj <- seuratpipeline('datafilepath.h5', format=h5)
dev.off()

## End(Not run)

apf2139/tamlabscpipeline documentation built on July 23, 2021, 11 a.m.