Description Usage Arguments Details Value
Estimate the saturation of UMI or gene detection based on rarefaction of the mapped read counts from a 10X RNAseq sample. This function takes the read counts for each sample and sequentially rarefies them at different levels to determine how thoroughly UMIs or genes are being sampled. Optional settings include the number of intermediate points to sample (default=6), the number of times to sample at each depth (default=5), and the minimum number of counts for a gene to be counted as "detected" (default=1).
1 2 3 4 5 6 | estimate_10X_saturation(
counts, max_reads=Inf,
method="sampling",
depths=6, nreps=5,
min_counts=1, min_cpm=NULL,
verbose=FALSE)
|
molecule_info |
the molecule_info data frame of a 10X gene-barcode matrix. Should include columns for barcode, gene, UMI, and reads. |
max_reads |
the maximum number of reads to sample at. By default, this value is the maximum of total read counts across all barcodes, genes, and umis. |
method |
character, either "division" or "sampling". Method "sampling" is slower but more realistic, and yields smoother curves. Method "division" is faster but more coarse and less realistic. See Details for more complete description. |
depths |
the read depths to sample at. Either a vector of read depths at which to sample, or a single integer value giving the number of evenly-spaced depths at which to sample. 0 is always included as an additional depth for plotting facility. |
nreps |
the number of samples to take for each library at each depth. With well-sampled libraries, 1 should be sufficient. With poorly-sampled libraries, sampling variance may be substantial, requiring higher values. Ignored if |
min_counts |
the minimum number of counts for a UMI/gene to be counted as detected. UMIs/genes with sample counts >= this value are considered detected. Defaults to 1. Set to NULL to use min_cpm. |
min_cpm |
the minimum counts per million for a UMI/gene to be counted as detected. UMIs/genes with sample counts >= this value are considered detected. Either this or min_count should be specified, but not both; including both yields an error. Defaults to NULL. |
verbose |
logical, whether to output the status of the estimation. |
The method parameter determines the approach used to estimate the number of UMIs or genes detected at different read depths. Method "division" simply divides the counts for each UMI/gene by a series of scaling factors, then counts the genes whose adjusted counts exceed the detection threshold. Method "sampling" generates a number of sets (nreps) of simulated counts for each library at each sequencing depth, by probabilistically simulating counts using observed proportions. It then counts the number of genes that meet the detection threshold in each simulation, and takes the arithmetic mean of the values for each library at each depth.
A data frame containing nreps rows for each depth, with one row for each sample at each depth. Columns include "sample" (the name of the sample identifier), "depth" (the depth value for that iteration), and "sat" (the number of genes or UMIs detected at that depth for that sample). For method "sampling", it includes an additional column with the variance of genes detected across all replicates of each sample at each depth.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.