srnadiff | R Documentation |
srnadiff
is a package that finds differently expressed regions
from RNA-seq data at base-resolution level without relying on
existing annotation. To do so, the package implements the
identify-then-annotate methodology that builds on the idea of
combining two pipelines approach: differential expressed regions detection
and differential expression quantification.
This is the main wrapper for running several key functions from this
package. It is meant to be used after that a srnadiffExp
object has been created. srnadiff
implement four methods to
produce potential DERs (see Details).
Once DERs are detected, the second step in srnadiff
is to
quantify the statistic signification of these.
srnadiff( object, segMethod = c("hmm", "IR"), diffMethod = "DESeq2", useParameters = srnadiffDefaultParameters, nThreads = 1 )
object |
An |
segMethod |
A character vector. The segmentation methods to use,
one of |
diffMethod |
A character. The differential expression testing
method to use, one of |
useParameters |
A named list containing the methods parameters to use.
If missing, default parameter values are supplied.
See |
nThreads |
|
The srnadiff
package implements two major methods to produce
potential differentially expressed regions: the HMM and IR method.
Briefly, these methods identify contiguous base-pairs in the genome
that present differential expression signal, then these are regrouped
into genomic intervals called differentially expressed regions (DERs).
Once DERs are detected, the second step in a sRNA-diff approach is to
quantify the statistic signification of these. To do so, reads (including
fractions of reads) that overlap each expressed region are counted to
arrive at a count matrix with one row per region and one column per sample.
Then, this count matrix is analyzed using the standard workflow of
DESeq2
for differential expression of RNA-seq data, assigning a
p-value to each candidate DER. Alternatively, other methods (edgeR
,
baySeq
) can be used.
The main functions for finds differently expressed regions are
srnadiffExp
and srnadiff
. The first one
creats an S4 class providing the infrastructure (slots) to store the
input data, methods parameters, intermediate calculations and results
of an sRNA-diff approach. The second one implement four methods to find
candidate differentially expressed regions and quantify the statistic
signification of the finded regions. Details about the implemented methods
are further described in the vignette and the manual page of the
srnadiff
function.
Implemented methods to produce potential differentially expressed
regions in srnadiff
are:
annotation:
This method simply provides the genomic regions corresponding to the annotation file that is optionally given by the user. It can be a set of known miRNAs, siRNAs, piRNAs, genes, or a combination thereof.
hmm:
This approach assumes that continuous regions of RNA
along the chromosome are either "differentially expressed" or "not".
This is captured with a hidden Markov model (HMM) with binary latent
state of each nucleotide: differentially expressed or
not differentially expressed. The observations of the HMM are
then the empirical p-values arising from the differential expression
analysis corresponding to each nucleotide position.
The HMM approach normally needs emission, transition, and starting
probabilities values (see parameters
). They
can be tuned by the user. In order to finding the most likely sequence
of states from the HMM, the Viterbi algorithm is performed. This
essentially segments the genome into regions, where a region is
defined as a set of consecutive bases showing a common expression
signature.
IR:
In this approach, for each base, the average from the normalized coverage is calculated across all samples into each condition. This generates a vector of (normalized) mean coverage expression per condition. These two vectors are then used to compute per-nucleotide log-ratios (in absolute value) across the genome. For the computed log-ratio expression, the method uses a sliding threshold h that run across the log-ratio levels identifying bases with log-ratio value above of h. Regions of contiguous bases passing this threshold are then analyzed using an adaptation of Aumann and Lindell algorithm for irreducibility property (Aumann and Lindell (2003)).
naive:
This method is the simplest, gived a fixed threshold h, contiguous bases with log-ratio expression (in absolute value) passing this threshold are then considered as candidate differentially expressed regions.
An srnadiffExp
object containing additional slots for:
regions
parameters
countMatrix
Matthias Zytnicki and Ignacio González
Aumann Y. and, Lindell Y. (2003). A Statistical Theory for Quantitative Association Rules. Journal of Intelligent Information Systems, 20(3):255-283.
regions
, parameters
, countMatrix
and srnadiffExp
## A typical srnadiff session might look like the following. ## Here we assume that 'bamFiles' is a vector with the full ## paths to the BAM files and the sample and experimental ## design information are stored in a data frame 'sampleInfo'. ## Not run: #-- Data preparation srnaExp <- srnadiffExp(bamFiles, sampleInfo) #-- Detecting DERs and quantifying differential expression srnaExp <- srnadiff(srnaExp) #-- Visualization of the results plotRegions(srnaExp, regions(srnaExp)[1]) ## End(Not run) srnaExp <- srnadiffExample() srnaExp <- srnadiff(srnaExp) srnaExp
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.