Description Objects from the Class Slots Extends Methods Author(s) See Also Examples
A class for representing reads from next-generation sequencing experiments that have been aligned to genomic intervals.
Objects can be created either by:
calls of the form
new("AlignedGenomeIntervals", .Data, closed, ...)
.
using the auxiliary function AlignedGenomeIntervals
and
supplying separate vectors of same length which hold the
required information:
AlignedGenomeIntervals(start, end, chromosome, strand, reads,
matches, sequence)
If arguments reads
or matches
are not specified, they
are assumed to be '1' for all intervals.
or, probably the most common way, by coercing from objects of
class AlignedRead
.
.Data
:two-column integer matrix, holding the start and end coordinates of the intervals on the chromosomes
sequence
:character; sequence of the read aligned to the interval
reads
:integer; total number of reads that were aligned to this interval
matches
:integer; the total number of genomic intervals that reads which were aligned to this interval were aligned to. A value of '1' thus means that this read sequence matches uniquely to this one genome interval only
organism
:string; an identifier for the genome of
which organism the intervals are related to. Functions making use
of this slot require a specific annotation package
org.<organism>.eg.db
. For example if organism
is
'Hs', the annotation package 'org.Hs.eg.db' is utilised by these
functions. The annotation packages can be obtained from the
Bioconductor repositories.
annotation
:data.frame; see class
genome_intervals
for details
closed
:matrix; see class
genome_intervals
for details
type
:character; see class
genome_intervals
for details
score
:numeric; optional score for each aligned genome interval
id
:character; optional identifier for each aligned genome interval
chrlengths
:integer; optional named integer vector of
chromosome lengths for the respective genome; if present it is
used in place of the chromosome lengths retrieved from the
annotation package (see slot organism
)
Class Genome_intervals-class
, directly.
Class Intervals_full
, by class
"Genome_intervals", distance 2.
Coercion method from objects of class
AlignedRead
, which is defined in package ShortRead
,
to objects of class AlignedGenomeIntervals
signature("AlignedGenomeIntervals")
: computes
the read coverage over all chromosomes. If the organism
of
the object is set correctly, the chromosome lengths are retrieved
from the appropriate annotation package, otherwise the maximum
interval end is taken to be the absolute length of that chromosome
(strand).
The result of this method is a list and the individual list
elements are of class Rle
, a class for encoding long
repetitive vectors that is defined in package IRanges
.
The additional argument byStrand
governs whether
the coverage is computed separately for each strand. If
byStrand=FALSE
(default) only one result is returned per
chromosome. If byStrand=TRUE
, there result is
two separate Rle
objects per chromosome with the strand
appended to the chromosome name.
signature("AlignedGenomeIntervals")
: a more
detailed output of all the intervals than provided by show
;
only advisable for objects containing few intervals
signature("AlignedGenomeIntervals")
with
additional arguments fiveprime=0L
and
threeprime=0L
. These must be integer numbers and greater
than or equal to 0. They specify how much is subtracted from the
left border of the interval and added to the right side. Which end
is 5' and which one is 3' are determined from the strand
information of the object.
Lastly, if the object has an organism
annotation, it is
checked that the right ends of the intervals do not exceed the
respective chromosome lengths.
export the aligned intervals as tab-delimited text
files which can be uploaded to the UCSC genome
browser as ‘custom tracks’.
Currently, there are methods for exporting the data
into ‘bed’ format and ‘bedGraph’ format,
either writing the intervals from both strands into one file or
into two separate files (formats ‘bedStrand’ and
‘bedGraphStrand’, respectively).
Details about these track formats can be found
at the UCSC genome browser web pages.
The additional argument writeHeader
can be set to
FALSE
to suppress writing of the track definition header
line to the file.
For Genome_intervals
objects, only ‘bed’ format is
supported at the moment and does not need to be specified.
signature("AlignedGenomeIntervals")
: creates
a histogram of the lengths of the reads aligned to the intervals
Get or set the organism that the genome intervals in
the object correspond to. Should be a predefined code, such as
'Mm' for mouse and 'Hs' for human. The reason for this code, that,
if the organism is set, a corresponding annotation package that is
called org.<organism>.eg.db
is used, for example for
obtaining the chromosome lengths to be used in methods such as
coverage
. These annotation packages can be obtained from
the Bioconductor repository.
visualisation method; a second argument of class
Genome_intervals_stranded
can be provided for additional
annotation to the plot. Please see below and in the vignette for
examples. Refer to the documentation of plotAligned
for more details on the plotting function.
collapse/reduce aligned genome intervals by combining
intervals which are completely included in each other, combining
overlapping intervals AND combining immediately adjacent
intervals (if method="standard"
).
Intervals are only combined if they are on the same
chromosome, the same strand AND have the same match specificity
of the aligned reads.
If you only want to combine intervals that have exactly the same
start and stop position (but may have reads of slightly different
sequence aligned to them), then use the argument
method="exact"
.
If you only want to combine intervals that have exactly the same
5' or 3' end (but may differ in the other end and in the aligned
sequence), then use the argument
method="same5"
(same 5' end) or
method="same3"
(same 3' end).
Finally, it's possible to only collapse/reduce aligned genome
intervals that overlap each other by at least a certain fraction
using the argument min.frac
. min.frac
is a number
between 0.0 and 1.0. For example, if you call reduce
with
argument min.frac=0.4
, only intervals that overlap
each other by at least 40 percent are collapsed/merged.
draw a random sample of n
(Argument
size
) of the aligned reads (without or with replacement)
and returns the AlignedGenomeIntervals
object defined by
these aligned reads.
access or set a custom score for the object
sorts the intervals by chromosome name, start and end
coordinate in increasing order (unless decreasing=TRUE
is
specified) and returns the sorted object
take a subset of reads, matrix-like subsetting via '\[' can also be used
Joern Toedling
Genome_intervals-class
,
AlignedRead-class
,
plotAligned
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 | ############# toy example:
A <- new("AlignedGenomeIntervals",
.Data=cbind(c(1,3,4,5,8,10), c(5,5,6,8,9,11)),
annotation=data.frame(
seq_name=factor(rep(c("chr1","chr2","chr3"), each=2)),
strand=factor(c("-","-","+","+","+","+") ,levels=c("-","+")),
inter_base=rep(FALSE, 6)),
reads=rep(3L, 6), matches=rep(1L,6),
sequence=c("ACATT","ACA","CGT","GTAA","AG","CT"))
show(A)
detail(A)
## alternative initiation of this object:
A <- AlignedGenomeIntervals(
start=c(1,3,4,5,8,10), end=c(5,5,6,8,9,11),
chromosome=rep(c("chr2","chrX","chr1"), each=2),
strand=c("-","-","+","+","+","+"),
sequence=c("ACATT","ACA","CGT","GGAA","AG","CT"),
reads=c(1L, 5L, 2L, 7L, 3L, 3L))
detail(A)
## custom identifiers can be assigned to the intervals
id(A) <- paste("gi", 1:6, sep="")
## subsetting and combining
detail(A[c(1:4)])
detail(c(A[1], A[4]))
## sorting: always useful
A <- sort(A)
detail(A)
## the 'reduce' method provides a cleaned-up, compact set
detail(reduce(A))
## with arguments specifying additional conditions for merging
detail(reduce(A, min.frac=0.8))
## 'sample' to draw a sample subset of reads and their intervals
detail(sample(A, 10))
## biological example
exDir <- system.file("extdata", package="girafe")
exA <- readAligned(dirPath=exDir, type="Bowtie",
pattern="aravinSRNA_23_no_adapter_excerpt_mm9_unmasked.bwtmap")
exAI <- as(exA, "AlignedGenomeIntervals")
organism(exAI) <- "Mm"
show(exAI)
## which chromosomes are the intervals on?
table(chromosome(exAI))
## subset
exAI[is.element(chromosome(exAI), c("chr1","chr2"))]
## compute coverage per chromosome:
coverage(exAI[is.element(chromosome(exAI), c("chr1","chr2"))])
### plotting:
load(file.path(exDir, "mgi_gi.RData"))
if (interactive())
plot(exAI, mgi.gi, chr="chrX", start=50400000, end=50410000)
### overlap with annotated genome elements:
exOv <- interval_overlap(exAI, mgi.gi)
## how many elements do read match positions generally overlap:
table(listLen(exOv))
## what are the 13 elements overlapped by a single match position:
mgi.gi[exOv[[which.max(listLen(exOv))]]]
## what kinds of elements are overlapped
(tabOv <- table(as.character(mgi.gi$type)[unlist(exOv)]))
### display those classes:
my.cols <- rainbow(length(tabOv))
if (interactive())
pie(tabOv, col=my.cols, radius=0.85)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.