encodeOverlaps-methods: Encode the overlaps between RNA-seq reads and the transcripts...

encodeOverlaps-methodsR Documentation

Encode the overlaps between RNA-seq reads and the transcripts of a gene model

Description

In the context of an RNA-seq experiment, encoding the overlaps between the aligned reads and the transcripts of a given gene model can be used for detecting those overlaps that are compatible with the splicing of the transcript.

The central tool for this is the encodeOverlaps method for GRangesList objects, which computes the "overlap encodings" between a query and a subject, both list-like objects with list elements containing multiple ranges.

Other related utilities are also documented in this man page.

Usage

encodeOverlaps(query, subject, hits=NULL, ...)

## S4 method for signature 'GRangesList,GRangesList'
encodeOverlaps(query, subject, hits=NULL,
               flip.query.if.wrong.strand=FALSE)

## Related utilities:

flipQuery(x, i)

selectEncodingWithCompatibleStrand(ovencA, ovencB,
                                   query.strand, subject.strand, hits=NULL)

isCompatibleWithSkippedExons(x, max.skipped.exons=NA)

extractSteppedExonRanks(x, for.query.right.end=FALSE)
extractSpannedExonRanks(x, for.query.right.end=FALSE)
extractSkippedExonRanks(x, for.query.right.end=FALSE)

extractQueryStartInTranscript(query, subject, hits=NULL, ovenc=NULL,
                              flip.query.if.wrong.strand=FALSE,
                              for.query.right.end=FALSE)

Arguments

query, subject

Typically GRangesList objects representing the the aligned reads and the transcripts of a given gene model, respectively. If the 2 objects don't have the same length, and if the hits argument is not supplied, then the shortest is recycled to the length of the longest (the standard recycling rules apply).

More generally speaking, query and subject must be list-like objects with list elements containing multiple ranges e.g. IntegerRangesList or GRangesList objects.

hits

An optional Hits object typically obtained from a previous call to findOverlaps(query, subject).

Strictly speaking, hits only needs to be compatible with query and subject, that is, queryLength(hits) and subjectLength(hits) must be equal to length(query) and length(subject), respectively.

Supplying hits is a convenient way to do encodeOverlaps(query[queryHits(hits)], subject[subjectHits(hits)]), that is, calling encodeOverlaps(query, subject, hits) is equivalent to the above, but is much more efficient, especially when query and/or subject are big. Of course, when hits is supplied, query and subject are not expected to have the same length anymore.

...

Additional arguments for methods.

flip.query.if.wrong.strand

See the "OverlapEncodings" vignette located in this package (GenomicAlignments).

x

For flipQuery: a GRangesList object.

For isCompatibleWithSkippedExons, extractSteppedExonRanks, extractSpannedExonRanks, and extractSkippedExonRanks: an OverlapEncodings object, a factor, or a character vector.

i

Subscript specifying the elements in x to flip. If missing, all the elements are flipped.

ovencA, ovencB, ovenc

OverlapEncodings objects.

query.strand, subject.strand

Vector-like objects containing the strand of the query and subject, respectively.

max.skipped.exons

Not supported yet. If NA (the default), the number of skipped exons must be 1 or more (there is no max).

for.query.right.end

If TRUE, then the information reported in the output is for the right ends of the paired-end reads. Using for.query.right.end=TRUE with single-end reads is an error.

Details

See ?OverlapEncodings for a short introduction to "overlap encodings".

The topic of working with overlap encodings is covered in details in the "OverlapEncodings" vignette located this package (GenomicAlignments) and accessible with vignette("OverlapEncodings").

Value

For encodeOverlaps: An OverlapEncodings object. If hits is not supplied, this object is parallel to the longest of query and subject, that is, it has the length of the longest and the i-th encoding in it corresponds to the i-th element in the longest. If hits is supplied, then the returned object is parallel to it, that is, it has one encoding per hit.

For flipQuery: TODO

For selectEncodingWithCompatibleStrand: TODO

For isCompatibleWithSkippedExons: A logical vector parallel to x.

For extractSteppedExonRanks, extractSpannedExonRanks, and extractSkippedExonRanks: TODO

For extractQueryStartInTranscript: TODO

Author(s)

Hervé Pagès

See Also

  • The OverlapEncodings class for a brief introduction to "overlap encodings".

  • The Hits class defined and documented in the S4Vectors package.

  • The "OverlapEncodings" vignette in this package.

  • findCompatibleOverlaps for a specialized version of findOverlaps that uses encodeOverlaps internally to keep only the hits where the junctions in the aligned read are compatible with the splicing of the annotated transcript.

  • The GRangesList class defined and documented in the GenomicRanges package.

  • The findOverlaps generic function defined in the IRanges package.

Examples

## ---------------------------------------------------------------------
## A. BETWEEN 2 IntegerRangesList OBJECTS
## ---------------------------------------------------------------------
## In the context of an RNA-seq experiment, encoding the overlaps
## between 2 GRangesList objects, one containing the reads (the query),
## and one containing the transcripts (the subject), can be used for
## detecting hits between reads and transcripts that are "compatible"
## with the splicing of the transcript. Here we illustrate this with 2
## IntegerRangesList objects, in order to keep things simple:

## 4 aligned reads in the query:
read1 <- IRanges(c(7, 15, 22), c(9, 19, 23))  # 2 junctions
read2 <- IRanges(c(5, 15), c(9, 17))  # 1 junction
read3 <- IRanges(c(16, 22), c(19, 24))  # 1 junction
read4 <- IRanges(c(16, 23), c(19, 24))  # 1 junction
query <- IRangesList(read1, read2, read3, read4)

## 1 transcript in the subject:
tx <- IRanges(c(1, 4, 15, 22, 38), c(2, 9, 19, 25, 47))  # 5 exons
subject <- IRangesList(tx)

## Encode the overlaps:
ovenc <- encodeOverlaps(query, subject)
ovenc
encoding(ovenc)

## ---------------------------------------------------------------------
## B. BETWEEN 2 GRangesList OBJECTS
## ---------------------------------------------------------------------
## With real RNA-seq data, the reads and transcripts will typically be
## stored in GRangesList objects. Please refer to the "OverlapEncodings"
## vignette in this package for realistic examples.

Bioconductor/GenomicAlignments documentation built on March 28, 2024, 9:59 a.m.