scDblFinder: scDblFinder
In scDblFinder: scDblFinder

Description Usage Arguments Details Value Examples

Identification of heterotypic (or neotypic) doublets in single-cell RNAseq using cluster-based generation of artifical doublets.

scDblFinder(
  sce,
  clusters = NULL,
  samples = NULL,
  trajectoryMode = FALSE,
  artificialDoublets = NULL,
  knownDoublets = NULL,
  use.cxds = TRUE,
  nfeatures = 1000,
  dims = 20,
  dbr = NULL,
  dbr.sd = 0.015,
  k = NULL,
  includePCs = 1:5,
  propRandom = 0.1,
  propMarkers = 0,
  returnType = c("sce", "table", "full"),
  score = c("xgb", "xgb.local.optim", "weighted", "ratio"),
  metric = "aucpr",
  nrounds = 50,
  max_depth = 5,
  iter = 1,
  threshold = TRUE,
  verbose = is.null(samples),
  BPPARAM = SerialParam(),
  ...
)

`sce`	A `SummarizedExperiment-class`, `SingleCellExperiment-class`, or array of counts.
`clusters`	The optional cluster assignments (if omitted, will run clustering). This is used to make doublets more efficiently. `clusters` should either be a vector of labels for each cell, or the name of a colData column of `sce`. Alternatively, if it is a single integer, will determine how many clusters to create (using k-means clustering). This options should be used when distinct subpopulations are not expected in the data (e.g. trajectories).
`samples`	A vector of the same length as cells (or the name of a column of `colData(x)`), indicating to which sample each cell belongs. Here, a sample is understood as being processed independently. If omitted, doublets will be searched for with all cells together. If given, doublets will be searched for independently for each sample, which is preferable if they represent different captures. If your samples were multiplexed using cell hashes, want you want to give here are the different batches/wells (i.e. independent captures, since doublets cannot arise across them) rather than biological samples.
`trajectoryMode`	Logical; whether to generate doublets in trajectory mode (i.e. for datasets with gradients rather than separated subpopulations). See `vignette("scDblFinder")` for more details.
`artificialDoublets`	The approximate number of artificial doublets to create. If `NULL`, will be the maximum of the number of cells or `5*nbClusters^2`.
`knownDoublets`	An optional logical vector of known doublets (e.g. through cell barcodes), or the name of a colData column of 'sce' containing that information.
`use.cxds`	Logical; whether to use the 'cxds' scores in addition to information from artificial/known doublets as part of the predictors.
`nfeatures`	The number of top features to use (default 1000)
`dims`	The number of dimensions used.
`dbr`	The expected doublet rate. By default this is assumed to be 1% per thousand cells captured (so 4% among 4000 thousand cells), which is appropriate for 10x datasets. Corrections for homeotypic doublets will be performed on the given rate.
`dbr.sd`	The uncertainty range in the doublet rate, interpreted as a +/- around 'dbr'. During thresholding, deviation from the expected doublet rate will be calculated from these boundaries, and will be considered null within these boundaries.
`k`	Number of nearest neighbors (for KNN graph). If more than one value is given, the doublet density will be calculated at each k (and other values at the highest k), and all the information will be used by the classifier. If omitted, a reasonable set of values is used.
`includePCs`	The index of principal components to include in the predictors (e.g. 'includePCs=1:2').
`propRandom`	The proportion of the artificial doublets which should be made of random cells (as opposed to inter-cluster combinations).
`propMarkers`	The proportion of features to select based on marker identification.
`returnType`	Either "sce" (default), "table" (to return the table of cell attributes including artificial doublets), or "full" (returns an SCE object containing both the real and artificial cells.
`score`	Score to use for final classification.
`metric`	Error metric to optimize during training (e.g. 'merror', 'logloss', 'auc', 'aucpr').
`nrounds`	Maximum rounds of boosting. If NULL, will be determined through cross-validation. When the training is based only on simulated doublets, we generally find lower limits to outperform cross-validation.
`max_depth`	Maximum depth of decision trees
`iter`	A positive integer indicating the number of scoring iterations (ignored if ‘score' isn’t based on classifiers). At each iteration, real cells that would be called as doublets are excluding from the training, and new scores are calculated. Recommended values are 1 or 2.
`threshold`	Logical; whether to threshold scores into binary doublet calls
`verbose`	Logical; whether to print messages and the thresholding plot.
`BPPARAM`	Used for multithreading when splitting by samples (i.e. when 'samples!=NULL'); otherwise passed to eventual PCA and K/SNN calculations.
`...`	further arguments passed to `getArtificialDoublets`.

This function generates artificial doublets from clusters of real cells, evaluates their prevalence in the neighborhood of each cells, and uses this along with additional features to classify doublets. The approach is complementary to doublets identified via cell hashes and SNPs in multiplexed samples: the latter can identify doublets formed by cells of the same type from two samples, which are nearly undistinguishable from real cells transcriptionally, but cannot identify doublets made by cells of the same sample. See vignette("scDblFinder") for more details on the method.

When multiple samples/captures are present, they should be specified using the samples argument. Although the classifier will be trained globally, thresholding and the more computationally-intensive steps will be performed separately for each sample (in parallel if BPPARAM is given).

When inter-sample doublets are available, they can be provided to 'scDblFinder' through the knownDoublets argument to improve the identification of further doublets.

The sce object with several additional colData columns, in particular 'scDblFinder.score' (the final score used) and 'scDblFinder.class' (whether the cell is called as 'doublet' or 'singlet'). See vignette("scDblFinder") for more details; for alternative return values, see the 'returnType' argument.

library(SingleCellExperiment)
sce <- mockDoubletSCE()
sce <- scDblFinder(sce, dbr=0.1)
table(truth=sce$type, call=sce$scDblFinder.class)

scDblFinder documentation built on Nov. 8, 2020, 5:48 p.m.

scDblFinder index

README.md Detecting clusters of doublet cells with DE analyses Introduction to the scDblFinder package Recovering intra-sample doublets scDblFinder Scoring potential doublets from simulated densities

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

scDblFinder
scDblFinder

scDblFinder: scDblFinder
In scDblFinder: scDblFinder

Description

Usage

Arguments

Details

Value

Examples

Related to scDblFinder in scDblFinder...

R Package Documentation

Browse R Packages

We want your feedback!

scDblFinder scDblFinder

scDblFinder: scDblFinder In scDblFinder: scDblFinder

Description

Usage

Arguments

Details

Value

Examples

Related to scDblFinder in scDblFinder...

R Package Documentation

Browse R Packages

We want your feedback!

scDblFinder
scDblFinder

scDblFinder: scDblFinder
In scDblFinder: scDblFinder