View source: R/Dropout_correction.R
CorrectDropout | R Documentation |
Dropout is the name given to a phenomenon originally identified by our lab and
further detailed in two independent publications (Zimmer et al. (2023),
and Berg et al. (2023)).
Dropout is the under-representation of reads from RNA containing metabolic label
(4-thiouridine or 6-thioguanidine most commonly). Loss of 4-thiouridine (s4U)
containing RNA on plastic surfaces and RT dropoff caused by
modifications on s4U introduced by recoding chemistry have been attributed as the likely
causes of this phenomenon. While protocols can be altered in ways to drastically reduce this
source of dropout, you may still have datasets that you want to analyze with bakR collected
with suboptimal handling. That is where CorrectDropout
comes in.
CorrectDropout(
obj,
scale_init = 1.05,
pdo_init = 0.3,
recalc_uncertainty = FALSE,
...
)
obj |
bakRFit object |
scale_init |
Numeric; initial estimate for -s4U/+s4U scale factor. This is the factor difference in RPM normalized read counts for completely unlabeled transcripts (i.e., highly stable transcript) between the +s4U and -s4U samples. |
pdo_init |
Numeric; initial estimtae for the dropout rate. This is the probability that an s4U labeled RNA molecule is lost during library prepartion. |
recalc_uncertainty |
Logical; if TRUE, then fraction new uncertainty is recalculated using adjusted fn and a simple binomial model of estimate uncertainty. This will provide a slight underestimate of the fn uncertainty, but will be far less biased for low coverage features, or for samples with low pnews. |
... |
Additional (optional) parameters to be passed to |
CorrectDropout
estimates the percentage of 4-thiouridine containing RNA
that was lost during library preparation (pdo). It then uses this estimate of pdo
to correct fraction new estimates and read counts. Both corrections are analytically
derived from a rigorous generative model of NR-seq data. Importantly, the read count
correction preserves the total library size to avoid artificially inflating read counts.
A bakRFit
or bakRFnFit
object (same type as was passed in). Fraction new estimates and read counts
in Fast_Fit$Fn_Estimates
and (in the case of a bakRFnFit
input) Data_lists$Fn_Est
are dropout corrected.
A count matrix with corrected read counts (Data_lists$Count_Matrix_corrected
) is also output, along with a
data frame with information about the dropout rate estimated for each sample (Data_lists$Dropout_df
).
# Simulate data for 500 genes and 2 replicates with 40% dropout
sim <- Simulate_relative_bakRData(500, 100000, nreps = 2, p_do = 0.4)
# Fit data with fast implementation
Fit <- bakRFit(sim$bakRData)
# Correct for dropout
Fit <- CorrectDropout(Fit)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.