AlignedGenomeIntervals-class: Class 'AlignedGenomeIntervals'

Description Objects from the Class Slots Extends Methods Author(s) See Also Examples

Description

A class for representing reads from next-generation sequencing experiments that have been aligned to genomic intervals.

Objects from the Class

Objects can be created either by:

  1. calls of the form new("AlignedGenomeIntervals", .Data, closed, ...).

  2. using the auxiliary function AlignedGenomeIntervals and supplying separate vectors of same length which hold the required information:
    AlignedGenomeIntervals(start, end, chromosome, strand, reads, matches, sequence)
    If arguments reads or matches are not specified, they are assumed to be '1' for all intervals.

  3. or, probably the most common way, by coercing from objects of class AlignedRead.

Slots

.Data:

two-column integer matrix, holding the start and end coordinates of the intervals on the chromosomes

sequence:

character; sequence of the read aligned to the interval

reads:

integer; total number of reads that were aligned to this interval

matches:

integer; the total number of genomic intervals that reads which were aligned to this interval were aligned to. A value of '1' thus means that this read sequence matches uniquely to this one genome interval only

organism:

string; an identifier for the genome of which organism the intervals are related to. Functions making use of this slot require a specific annotation package org.<organism>.eg.db. For example if organism is 'Hs', the annotation package 'org.Hs.eg.db' is utilised by these functions. The annotation packages can be obtained from the Bioconductor repositories.

annotation:

data.frame; see class genome_intervals for details

closed:

matrix; see class genome_intervals for details

type:

character; see class genome_intervals for details

score:

numeric; optional score for each aligned genome interval

id:

character; optional identifier for each aligned genome interval

chrlengths:

integer; optional named integer vector of chromosome lengths for the respective genome; if present it is used in place of the chromosome lengths retrieved from the annotation package (see slot organism)

Extends

Class Genome_intervals-class, directly. Class Intervals_full, by class "Genome_intervals", distance 2.

Methods

coerce

Coercion method from objects of class AlignedRead, which is defined in package ShortRead, to objects of class AlignedGenomeIntervals

coverage

signature("AlignedGenomeIntervals"): computes the read coverage over all chromosomes. If the organism of the object is set correctly, the chromosome lengths are retrieved from the appropriate annotation package, otherwise the maximum interval end is taken to be the absolute length of that chromosome (strand).
The result of this method is a list and the individual list elements are of class Rle, a class for encoding long repetitive vectors that is defined in package IRanges.
The additional argument byStrand governs whether the coverage is computed separately for each strand. If byStrand=FALSE (default) only one result is returned per chromosome. If byStrand=TRUE, there result is two separate Rle objects per chromosome with the strand appended to the chromosome name.

detail

signature("AlignedGenomeIntervals"): a more detailed output of all the intervals than provided by show; only advisable for objects containing few intervals

extend

signature("AlignedGenomeIntervals") with additional arguments fiveprime=0L and threeprime=0L. These must be integer numbers and greater than or equal to 0. They specify how much is subtracted from the left border of the interval and added to the right side. Which end is 5' and which one is 3' are determined from the strand information of the object. Lastly, if the object has an organism annotation, it is checked that the right ends of the intervals do not exceed the respective chromosome lengths.

export

export the aligned intervals as tab-delimited text files which can be uploaded to the UCSC genome browser as ‘custom tracks’. Currently, there are methods for exporting the data into ‘bed’ format and ‘bedGraph’ format, either writing the intervals from both strands into one file or into two separate files (formats ‘bedStrand’ and ‘bedGraphStrand’, respectively). Details about these track formats can be found at the UCSC genome browser web pages.
The additional argument writeHeader can be set to FALSE to suppress writing of the track definition header line to the file.
For Genome_intervals objects, only ‘bed’ format is supported at the moment and does not need to be specified.

hist

signature("AlignedGenomeIntervals"): creates a histogram of the lengths of the reads aligned to the intervals

organism

Get or set the organism that the genome intervals in the object correspond to. Should be a predefined code, such as 'Mm' for mouse and 'Hs' for human. The reason for this code, that, if the organism is set, a corresponding annotation package that is called org.<organism>.eg.db is used, for example for obtaining the chromosome lengths to be used in methods such as coverage. These annotation packages can be obtained from the Bioconductor repository.

plot

visualisation method; a second argument of class Genome_intervals_stranded can be provided for additional annotation to the plot. Please see below and in the vignette for examples. Refer to the documentation of plotAligned for more details on the plotting function.

reduce

collapse/reduce aligned genome intervals by combining intervals which are completely included in each other, combining overlapping intervals AND combining immediately adjacent intervals (if method="standard"). Intervals are only combined if they are on the same chromosome, the same strand AND have the same match specificity of the aligned reads.
If you only want to combine intervals that have exactly the same start and stop position (but may have reads of slightly different sequence aligned to them), then use the argument method="exact".
If you only want to combine intervals that have exactly the same 5' or 3' end (but may differ in the other end and in the aligned sequence), then use the argument method="same5" (same 5' end) or method="same3" (same 3' end).
Finally, it's possible to only collapse/reduce aligned genome intervals that overlap each other by at least a certain fraction using the argument min.frac. min.frac is a number between 0.0 and 1.0. For example, if you call reduce with argument min.frac=0.4, only intervals that overlap each other by at least 40 percent are collapsed/merged.

sample

draw a random sample of n (Argument size) of the aligned reads (without or with replacement) and returns the AlignedGenomeIntervals object defined by these aligned reads.

score

access or set a custom score for the object

sort

sorts the intervals by chromosome name, start and end coordinate in increasing order (unless decreasing=TRUE is specified) and returns the sorted object

subset

take a subset of reads, matrix-like subsetting via '\[' can also be used

Author(s)

Joern Toedling

See Also

Genome_intervals-class, AlignedRead-class, plotAligned

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
  ############# toy example:
  A <- new("AlignedGenomeIntervals",
         .Data=cbind(c(1,3,4,5,8,10), c(5,5,6,8,9,11)),
         annotation=data.frame(
           seq_name=factor(rep(c("chr1","chr2","chr3"), each=2)),
           strand=factor(c("-","-","+","+","+","+") ,levels=c("-","+")),
           inter_base=rep(FALSE, 6)),
         reads=rep(3L, 6), matches=rep(1L,6),
         sequence=c("ACATT","ACA","CGT","GTAA","AG","CT"))

  show(A)
  detail(A)

  ## alternative initiation of this object:
  A <- AlignedGenomeIntervals(
     start=c(1,3,4,5,8,10), end=c(5,5,6,8,9,11),
     chromosome=rep(c("chr2","chrX","chr1"), each=2),
     strand=c("-","-","+","+","+","+"),
     sequence=c("ACATT","ACA","CGT","GGAA","AG","CT"),
     reads=c(1L, 5L, 2L, 7L, 3L, 3L))
  detail(A)

  ## custom identifiers can be assigned to the intervals
  id(A) <- paste("gi", 1:6, sep="")

  ## subsetting and combining
  detail(A[c(1:4)])
  detail(c(A[1], A[4]))

  ## sorting: always useful
  A <- sort(A)
  detail(A)
  
  ## the 'reduce' method provides a cleaned-up, compact set
  detail(reduce(A))
  ##  with arguments specifying additional conditions for merging
  detail(reduce(A, min.frac=0.8))

  ## 'sample' to draw a sample subset of reads and their intervals
  detail(sample(A, 10))
  
  ## biological example
  exDir <- system.file("extdata", package="girafe")
  exA   <- readAligned(dirPath=exDir, type="Bowtie", 
    pattern="aravinSRNA_23_no_adapter_excerpt_mm9_unmasked.bwtmap")
  exAI <- as(exA, "AlignedGenomeIntervals")
  organism(exAI) <- "Mm"
  show(exAI)
  ## which chromosomes are the intervals on?
  table(chromosome(exAI))

  ## subset
  exAI[is.element(chromosome(exAI),  c("chr1","chr2"))]

  ## compute coverage per chromosome:
  coverage(exAI[is.element(chromosome(exAI),  c("chr1","chr2"))])

  ### plotting:
  load(file.path(exDir, "mgi_gi.RData"))
  if (interactive())
     plot(exAI, mgi.gi, chr="chrX", start=50400000, end=50410000)

  ### overlap with annotated genome elements:
  exOv <- interval_overlap(exAI, mgi.gi)
  ## how many elements do read match positions generally overlap:
  table(listLen(exOv))
  ## what are the 13 elements overlapped by a single match position:
  mgi.gi[exOv[[which.max(listLen(exOv))]]]
  ## what kinds of elements are overlapped
  (tabOv <- table(as.character(mgi.gi$type)[unlist(exOv)]))
  ### display those classes:
  my.cols <- rainbow(length(tabOv))
  if (interactive())
     pie(tabOv, col=my.cols, radius=0.85)

girafe documentation built on Nov. 8, 2020, 4:56 p.m.