scrublet_R: Scrublet

Description Usage Arguments Value

Description

See preprint: Scrublet: computational identification of cell doublets in single-cell transcriptomic data Samuel L Wolock, Romain Lopez, Allon M Klein. bioRxiv 357368; doi: https://doi.org/10.1101/357368

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
scrublet_R(
  cds,
  python_home = system("which python", intern = TRUE),
  return_results_only = FALSE,
  min_counts = 2,
  min_cells = 3,
  expected_doublet_rate = 0.06,
  min_gene_variability_pctl = 85,
  n_prin_comps = 50,
  sim_doublet_ratio = 2,
  n_neighbors = NULL
)

Arguments

cds

the CellDataSet upon which to perform Scrublet

python_home

The python home directory where Scrublet is installed

return_results_only

bool (optional, default False)

min_counts,

int (optional, default=2), See scrublet reference

min_cells,

int (optional, default=3), See scrublet reference

expected_doublet_rate,

float (optional, default=0.06), See scrublet reference - expected_doublet_rate: the fraction of transcriptomes that are doublets, typically 0.05-0.1. Results are not particularly sensitive to this parameter. For this example, the expected doublet rate comes from the Chromium User Guide: https://support.10xgenomics.com/permalink/3vzDu3zQjY0o2AqkkkI4CC

min_gene_variability_pctl,

int (optional, default=85), See scrublet reference

n_prin_comps,

int (optional, default=50), See scrublet reference (Number of principal components to use)

sim_doublet_ratio,

int (optional, default=2), the number of doublets to simulate, relative to the number of observed transcriptomes. This should be high enough that all doublet states are well-represented by simulated doublets. Setting it too high is computationally expensive. The default value is 2, though values as low as 0.5 give very similar results for the datasets that have been tested.

n_neighbors,

int (optional) n_neighbors: Number of neighbors used to construct the KNN classifier of observed transcriptomes and simulated doublets. The default value of round(0.5*sqrt(n_cells)) generally works well. Return only a list containing scrublet output

Value

The input CellDataSet with an additional column added to pData with both the doublet_score output from scrublet, and


scfurl/m3addon documentation built on Aug. 9, 2021, 5:30 p.m.