knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.dim = c(6, 4) )
Install and load bakR to run the code in this vignette; instructions on how to do so can be found at this link. Also, you'll want to set the seed so as to ensure the results you get reproduce those presented in the vignette.
library(bakR) set.seed(123)
The 1st step to using bakR is to create a bakRData
object. A bakRData object
consists of two components: a cB data frame and a metadf data frame.
cB stands for counts binomial and contains all of the information about
mutations seen in sequencing reads in each sample sequenced. metadf stands
for metadata data frame and contains important information about the
experimental details of each sample (i.e., how long the metabolic label
feed was, which samples are reference samples, and which are experimental
samples). Examples of what these data structures are available via calls to
data("cB_small")
and data("metadf")
, for the cB and metadf respectively.
A cB data frame consists of rows corresponding to groups of reads with identical data, where data corresponds to the values of the following four variables:
The last column of a cB data frame is n: Number of reads with identical data for the other 4 columns.
cB data frames are most easily obtained from the Snakemake implementation of a pipeline developed by our lab, called bam2bakR (available here).
The metadf has two columns:
The metadf data frame must also have row names corresponding to the sample name that the tl and Exp_ID entries describe, as the sample name appears in the cB data frame.
Once you have these two data frames correctly constructed, you can create a bakRData object:
# Load data data("cB_small") data("metadf") # Create bakRData object bakRData <- bakRData(cB_small, metadf)
bakR contains three different implementations of a statistical model of NR-seq data. You must always first run the most efficient implementation. The other two implementations can then be run using the output of this implementation. Below I will show how to run the efficient implementation using simulated data:
# Simulate a nucleotide recoding dataset sim_data <- Simulate_bakRData(500) # This will simulate 500 features, 2 experimental conditions # and 3 replicates for each experimental condition # See ?Simulate_bakRData for details regarding tunable parameters # Extract simulated bakRData object bakRData <- sim_data$bakRData # Extract simualted ground truths sim_truth <- sim_data$sim_list # Run the efficient model Fit <- bakRFit(bakRData)
bakRFit() is used here as a wrapper for two functions in bakR: cBprocess() and fast_analysis(). For
more details on what these functions do, run ?cBprocess
or fast_analysis
. Alternatively,
see the more highly detailed version of this vignette for additional details.
While the fast implementation was running, it outputted a message regarding the estimated pnews and pold. The pnews are the estimated mutation rates of reads from new RNAs (new meaning RNAs synthesized after the start of s^4^U labeling) in each sample (muts = Exp_ID, and reps = a numerical replicate ID that corresponds to the order replicates appear in the cB), and polds are the estimates for the background mutation rate used in all analyses. For more details on how bakR estimates these mutation rates and alternative estimation strategies implemented in bakR, see the longer form version of this vignette as well as the vignette on troubleshooting analyses.
To run the heavier, more highly powered models, just rerun bakRFit() on the Fit object, but with either the StanFit or HybridFit parameters set to true.
# Run Hybrid model (This might take several minutes to run) Fit <- bakRFit(Fit, HybridFit = TRUE) # Run Full model (This might take ~10-30 minutes to run) Fit <- bakRFit(Fit, StanFit = TRUE)
The Fit objects contain lists pertaining to the fits of each of the models. The possible contents include:
bakR provides a variety of easy to use functions for beginning to investigate your data. The visualizations are particularly aimed at revealing trends in RNA stabilization or destabilization. These include MA plots:
## MA Plot with Fast Fit bakR::plotMA(Fit, Model = "MLE")
Volcano plots:
## Volcano Plot with Fast Fit; significance assessed relative to an FDR control of 0.05 plotVolcano(Fit$Fast_Fit)
and PCA plots:
## 2D PCA plot with replicate fraction news # The equivalent function prior to version 1.0.0 is FnPCA, now deprecated in # favor of FnPCA2. FnPCA2(Fit, Model = "MLE")
This vignette provides the minimal amount of information to get up and running with bakR. If you would like a more thorough discussion of each step of this process, check out the long form version of this vignette ("Differential kinetic analysis with bakR"). In addition, there are a number of other vignettes that cover various topics not discussed in these intro vignettes:
Note, all but the "Differential synthesis rate analysis" vignette are only fully compatible with version 1.0.0 of bakR. Update bakR as necessary
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.