purge_phantoms | R Documentation |
Step 5: Reassign hopped reads and purges phantom molecules
purge_phantoms(out, torc, return_readcounts = FALSE, return_discarded = TRUE)
out |
out from previous steps |
torc |
TOR cutoff |
return_readcounts |
If true the joined readcounts is returned |
return_discarded |
return discarded data |
For each outcome, the conditional posterior probability q|y
of the possible true
samples of origin is computed by plugging in \pi_r
, the estimated
proportion of molecules across samples. The index of the sample with the maximum posterior
probability along with posterior probability itself is added to the original joined read count table.
The predicted true sample of origin and its associated posterior probability is then used to reassign
reads to their predicted sample of origin.
In order to remove predicted phantom molecules from the data while minimizing the rate of false positives and false negatives,
the Trade-Off Ratio (TOR) statistic is computed by dividing the marginal increase in FNs over the marginal decrease
of FPs for each observed unique qr*
value. The cutoff TOR*
that gets effectively chosen would correspond to the largest
observed TOR value not exceeding the preset TOR cutoff value (default value is 3). All molecules with corresponding TOR values
strictly less than TOR*
cutoff - not TOR cutoff- are discarded. For example, if we have tor= (0.1, 0.5, 2.9, 4.1, ...)
and TOR cutoff=3, then TOR*
cutoff=2.9 and predicted real molecules corresponding to tor=0.1
and tor=0.5
are discarded.
To purge the data, the read counts are first deduplicated to obtain a table of molecule (i.e. UMI) counts.
After purging, the molecule counts are collapsed over gene labels to produce a gene-by-cell umi-count expression matrices for all the samples sequenced in the same lane.
A the initial list out with umi_counts, summary_stats added.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.