countReads-methods | R Documentation |
countReads
counts the reads assigned to a SplicingGraphs object.
The counting can be done by splicing graph edge, reduced splicing
graph edge, transcript, or gene.
reportReads
is similar to countReads
but returns right before
the final counting step, that is, the returned DataFrame contains the reads
instead of their counts.
countReads(x, by=c("sgedge", "rsgedge", "tx", "gene"))
reportReads(x, by=c("sgedge", "rsgedge", "tx", "gene"))
x |
A SplicingGraphs object. |
by |
Can be |
countReads
and reportReads
allow summarization of the reads
at different levels of resolution. The level of resolution is determined
by the type of feature that one chooses via the by
argument.
The supported resolutions are (from highest to lowest resolution):
by="sgedge"
for summarization at the splicing graph edge
level (i.e. at the exons/intron level);
by="rsgedge"
for summarization at the reduced
splicing graph edge level;
by="tx"
for summarization at the transcript level;
by="gene"
for summarization at the gene level.
There is a parent-child relationship between the features corresponding to a given level of resolution (the parent features) and those corresponding to a higher level of resolution (the child features).
For example, in the case of the 2 first levels of resolution listed above, the parent-child relationship is the following: the parent features are the reduced splicing graph edges, the child features are the splicing graph edges, and each parent feature is obtained by merging one or more child features together. Similarly, transcripts can be seen as parent features of reduced splicing graph edges, and genes as parent features of transcripts. Note that, the rsgedge/sgedge and gene/tx relationships are one-to-many, but the tx/rsgedge relationship is many-to-many because a given edge can belong to more than one transcript.
Finally the parent-child relationships between 2 arbitrary levels of resolution is defined by combining the relationships between consecutive levels. All possible parent-child relationships are summarized in the following table:
| to: sgedge | to: rsgedge | to: tx --------------+--------------+--------------+------------ from: rsgedge | one-to-many | | from: tx | many-to-many | many-to-many | from: gene | one-to-many | one-to-many | one-to-many
An important distinction needs to be made between a read that hits a given feature multiple times and a read that hits more than one feature.
If the former, the read is counted/reported only once for that feature. For example, when summarizing at the transcript level, a read is counted/reported only once for a given transcript, even if that read hits more than one splicing graph edge (or reduced splicing graph edge) associated with that transcript.
If the latter, the read is said to be ambiguous. An ambiguous read is currently counted/reported for each feature where it has a hit. This is a temporary situation: in the near future the user will be offered options to handle ambiguous reads in different ways.
A read might be ambiguous at one level of resolution but not at the other.
Also the number of ambiguous reads is typically affected by the level
of resolution. However, even though higher resolution generally means
more ambiguous reads, this is only true when the switch from one level
of resolution to the other implies a parent-child relationship between
features that is one-to-many.
So, based on the above table, this is always true, except when
switching from using by="tx"
to using by="sgedge"
or
by="rsgedge"
. In those cases, the switch can produce more
ambiguities but it can also produce less.
A DataFrame object with one row per:
unique splicing graph edge, if by="sgedge"
;
unique reduced splicing graph edge, if by="rsgedge"
;
transcript if by="tx"
;
gene if by="gene"
.
And with one column per sample (containing the counts for that sample for
countReads
, and the reads for that sample for reportReads
),
plus the two following left columns:
if by="sgedge"
: "sgedge_id"
, containing the
global splicing graph edge ids, and "ex_or_in"
,
containing the type of edge (exon or intron);
if by="rsgedge"
: "rsgedge_id"
, containing the
global reduced splicing graph edge ids, and
"ex_or_in"
, containing the type of edge (exon, intron,
or mixed);
if by="tx"
: "tx_id"
and "gene_id"
;
if by="gene"
: "gene_id"
and "tx_id"
.
For countReads
, each column of counts is of type integer and
is named after the corresponding sample.
For reportReads
, each column of reads is a CharacterList object
and its name is the name of the corresponding sample with the
".hits"
suffix added to it.
In both cases, the name of the sample is the name that was passed to
assignReads
when the reads of a given sample were initially
assigned. See ?assignReads
for more information.
H. Pagès
This man page is part of the SplicingGraphs package.
Please see ?`SplicingGraphs-package`
for an overview of the
package and for an index of its man pages.
## ---------------------------------------------------------------------
## 1. Make SplicingGraphs object 'sg' from toy gene model and assign toy
## reads to it (see '?assignReads')
## ---------------------------------------------------------------------
example(assignReads)
## ---------------------------------------------------------------------
## 2. Summarize the reads by splicing graph edge
## ---------------------------------------------------------------------
countReads(sg)
reportReads(sg)
## ---------------------------------------------------------------------
## 3. Summarize the reads by reduced splicing graph edge
## ---------------------------------------------------------------------
countReads(sg, by="rsgedge")
reportReads(sg, by="rsgedge")
## ---------------------------------------------------------------------
## 4. Summarize the reads by transcript
## ---------------------------------------------------------------------
countReads(sg, by="tx")
reportReads(sg, by="tx")
## ---------------------------------------------------------------------
## 5. Summarize the reads by gene
## ---------------------------------------------------------------------
countReads(sg, by="gene")
reportReads(sg, by="gene")
## ---------------------------------------------------------------------
## 6. A close look at ambiguous reads
## ---------------------------------------------------------------------
resolutions <- c("sgedge", "rsgedge", "tx", "gene")
reported_reads <- lapply(resolutions,
function(by) {
reported_reads <- reportReads(sg, by=by)
unlist(reported_reads$TOYREADS.hits)
})
## The set of reported reads is the same at all levels of resolution:
unique_reported_reads <- lapply(reported_reads, unique)
stopifnot(identical(unique_reported_reads,
rep(unique_reported_reads[1], 4)))
## Extract ambigous reads for each level of resolution:
ambiguous_reads <- lapply(reported_reads,
function(x) unique(x[duplicated(x)]))
names(ambiguous_reads) <- resolutions
ambiguous_reads
## Reads that are ambiguous at the "rsgedge" level must also be
## ambiguous at the "sgedge" level:
stopifnot(all(ambiguous_reads$rsgedge %in% ambiguous_reads$sgedge))
## However, there is no reason why reads that are ambiguous at the
## "tx" level should also be ambiguous at the "sgedge" or "rsgedge"
## level!
## ---------------------------------------------------------------------
## 7. Remove the reads from 'sg'.
## ---------------------------------------------------------------------
sg <- removeReads(sg)
countReads(sg)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.