View source: R/qc_diagnostics.R
qc_diagnostic | R Documentation |
Taking Cell Rangers output, namely the (i) feature, (ii) barcode, (iii) matrix files within respective folder(s), filtered_feature_bc_matrix and optionally additionally raw_feature_bc_matrix, a very default Seurat Object is generated and quality metrics are computed. Moreover, these metrics are used to cluster the cells. Groups of cells with low quality scores (e.g. high mt-fraction plus low number of detected features) will likely clusters together. This allows to filter them easily. If raw_feature_bc_matrix is provided, SoupX may be run. In that case the whole pipeline in run twice. Namely, once with the original count matrix and once with a corrected count matrix after running SoupX with default settings. If raw_feature_bc_matrix is not available, only DecontX is available to check for ambient RNA contamination. Both of these metrics (SoupX and DecontX estimation of ambient RNA) become part of clustering by qc metrics if set to TRUE. scDblFinder is run to detect doublets which is another quality metric. In addition to qc metrics, a number of principle components (PCs) from feature expression may be added to clustering and dimension reduction as also the feature composition of low quality transcriptomes may be skewed. This will likely cause these cells to cluster separately even more. Apart from clustering with meta data, also a pure analysis with feature expression values only is run. That may allow cluster-wise application of additional filters for qc metrics after very definite low quality transcriptomes have been eliminated in a first round based on qc metric clustering. If data_dirs contains multiple samples then integration of samples is done with harmony. (i) Detection of soup (ambient RNA) by SoupX and decontX, (ii) detection of doublets and (iii) calculation of residuals from the linear model of nCount_RNA_log vs nFeature_RNA_log is done sample-wise when multiple data_dirs are detected/provided. Results are written into the common Seurat object though, the merged and harmonzized PCA space of which is subject for clustering the cells based on feature expression (phenotypes)
qc_diagnostic(
data_dirs,
nhvf = 2000,
npcs = 10,
resolution = 0.8,
resolution_SoupX = 0.6,
resolution_meta = 0.8,
n_PCs_to_meta_clustering = 2,
scDblFinder = T,
SoupX = F,
decontX = F,
return_SoupX = T,
cells = NULL,
invert_cells = F,
feature_rm = NULL,
feature_aggr = NULL,
...
)
data_dirs |
list or vector of parent direction(s) which will be search for folders called "filtered_feature_bc_matrix"; on the same level where each of these folders is found, a raw_feature_bc_matrix folder may exist to enable SoupX; if one "raw_feature_bc_matrix" is missing, SoupX is disable for all others |
nhvf |
number of highly variable features for every of the procedures |
npcs |
number or principle components to calculate, e.g. 12 for diverse data sets and 8 for isolated subsets |
resolution |
resolution (louvain algorithm) for clustering based on feature expression |
resolution_SoupX |
resolution (louvain algorithm) for SoupX analysis |
resolution_meta |
resolution(s) (louvain algorithm) for clustering based on qc meta data and optionally additional PC dimensions (n_PCs_to_meta_clustering) |
n_PCs_to_meta_clustering |
how many principle components (PCs) from phenotypic clustering to add to qc meta data; this will generate a mixed clustering (PCs from phenotypes (RNA) and qc meta data like pct mt and nCount_RNA); the more PCs are added the greater the phenotypic influence becomes; one or more integers can be supplied to explore the effect; pass 0, to have no PCs included in meta clustering; e.g. when n_PCs_to_meta_clustering = 3 PCs 1-3 are used, when n_PCs_to_meta_clustering = 1 only PC 1 is used. |
scDblFinder |
logical, whether to run doublet detection algorithm from scDblFinder |
SoupX |
logical whether to run SoupX. If TRUE, raw_feature_bc_matrix is needed. |
decontX |
logical whether to run celda::decontX to estimate RNA soup (contaminating ambient RNA molecules) |
return_SoupX |
logical whether to return a full Seurat-object and diagnostics from SoupX (TRUE) or whether to run SoupX without these returns and just have the Soup-metric included as an additional quality-control metric along with pct_mt and nCount_RNA etc. Will be set to FALSE if more than one data_dir with filtered_feature_bc_matrix is supplied. So, only possible when data set are provided one by one. |
cells |
vector of cell names to include, consider the trailing '-1' in cell names |
invert_cells |
invert cell selection, if TRUE cell names provides in 'cells' are excluded |
feature_rm |
character vector of features to remove from count matrices; removal is done after aggregation (if feature_aggr is provided) |
feature_aggr |
named list of character vectors of features to aggregate; names of of list entries are names of the aggregated feature; aggregation of counts is simply done by addition; aggregation is done before feature removal (if feature_rm is provided) |
... |
additional arguments to SoupX::autoEstCont |
a list of Seurat object and data frame with marker genes for clusters based on feature expression
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.