CorrectDropout: Correcting for metabolic labeling induced RNA dropout

View source: R/Dropout_correction.R

CorrectDropoutR Documentation

Correcting for metabolic labeling induced RNA dropout

Description

Dropout is the name given to a phenomenon originally identified by our lab and further detailed in two independent publications (Zimmer et al. (2023), and Berg et al. (2023)). Dropout is the under-representation of reads from RNA containing metabolic label (4-thiouridine or 6-thioguanidine most commonly). Loss of 4-thiouridine (s4U) containing RNA on plastic surfaces and RT dropoff caused by modifications on s4U introduced by recoding chemistry have been attributed as the likely causes of this phenomenon. While protocols can be altered in ways to drastically reduce this source of dropout, you may still have datasets that you want to analyze with bakR collected with suboptimal handling. That is where CorrectDropout comes in.

Usage

CorrectDropout(
  obj,
  scale_init = 1.05,
  pdo_init = 0.3,
  recalc_uncertainty = FALSE,
  ...
)

Arguments

obj

bakRFit object

scale_init

Numeric; initial estimate for -s4U/+s4U scale factor. This is the factor difference in RPM normalized read counts for completely unlabeled transcripts (i.e., highly stable transcript) between the +s4U and -s4U samples.

pdo_init

Numeric; initial estimtae for the dropout rate. This is the probability that an s4U labeled RNA molecule is lost during library prepartion.

recalc_uncertainty

Logical; if TRUE, then fraction new uncertainty is recalculated using adjusted fn and a simple binomial model of estimate uncertainty. This will provide a slight underestimate of the fn uncertainty, but will be far less biased for low coverage features, or for samples with low pnews.

...

Additional (optional) parameters to be passed to stats::nls()

Details

CorrectDropout estimates the percentage of 4-thiouridine containing RNA that was lost during library preparation (pdo). It then uses this estimate of pdo to correct fraction new estimates and read counts. Both corrections are analytically derived from a rigorous generative model of NR-seq data. Importantly, the read count correction preserves the total library size to avoid artificially inflating read counts.

Value

A bakRFit or bakRFnFit object (same type as was passed in). Fraction new estimates and read counts in Fast_Fit$Fn_Estimates and (in the case of a bakRFnFit input) Data_lists$Fn_Estare dropout corrected. A count matrix with corrected read counts (Data_lists$Count_Matrix_corrected) is also output, along with a data frame with information about the dropout rate estimated for each sample (Data_lists$Dropout_df).

Examples


# Simulate data for 500 genes and 2 replicates with 40% dropout
sim <- Simulate_relative_bakRData(500, 100000, nreps = 2, p_do = 0.4)

# Fit data with fast implementation
Fit <- bakRFit(sim$bakRData)

# Correct for dropout
Fit <- CorrectDropout(Fit)



bakR documentation built on June 22, 2024, 6:55 p.m.