encodeOverlaps-methods: Encode the overlaps between RNA-seq reads and the transcripts...
In Bioconductor/GenomicAlignments: Representation and manipulation of short genomic alignments

encodeOverlaps-methods

R Documentation

Encode the overlaps between RNA-seq reads and the transcripts of a gene model

Description

In the context of an RNA-seq experiment, encoding the overlaps between the aligned reads and the transcripts of a given gene model can be used for detecting those overlaps that are compatible with the splicing of the transcript.

The central tool for this is the encodeOverlaps method for GRangesList objects, which computes the "overlap encodings" between a query and a subject, both list-like objects with list elements containing multiple ranges.

Other related utilities are also documented in this man page.

Usage

encodeOverlaps(query, subject, hits=NULL, ...)

## S4 method for signature 'GRangesList,GRangesList'
encodeOverlaps(query, subject, hits=NULL,
               flip.query.if.wrong.strand=FALSE)

## Related utilities:

flipQuery(x, i)

selectEncodingWithCompatibleStrand(ovencA, ovencB,
                                   query.strand, subject.strand, hits=NULL)

isCompatibleWithSkippedExons(x, max.skipped.exons=NA)

extractSteppedExonRanks(x, for.query.right.end=FALSE)
extractSpannedExonRanks(x, for.query.right.end=FALSE)
extractSkippedExonRanks(x, for.query.right.end=FALSE)

extractQueryStartInTranscript(query, subject, hits=NULL, ovenc=NULL,
                              flip.query.if.wrong.strand=FALSE,
                              for.query.right.end=FALSE)

Arguments

`query`, `subject`	Typically GRangesList objects representing the the aligned reads and the transcripts of a given gene model, respectively. If the 2 objects don't have the same length, and if the `hits` argument is not supplied, then the shortest is recycled to the length of the longest (the standard recycling rules apply). More generally speaking, `query` and `subject` must be list-like objects with list elements containing multiple ranges e.g. IntegerRangesList or GRangesList objects.
`hits`	An optional Hits object typically obtained from a previous call to `findOverlaps(query, subject)`. Strictly speaking, `hits` only needs to be compatible with `query` and `subject`, that is, `queryLength(hits)` and `subjectLength(hits)` must be equal to `length(query)` and `length(subject)`, respectively. Supplying `hits` is a convenient way to do `encodeOverlaps(query[queryHits(hits)], subject[subjectHits(hits)])`, that is, calling `encodeOverlaps(query, subject, hits)` is equivalent to the above, but is much more efficient, especially when `query` and/or `subject` are big. Of course, when `hits` is supplied, `query` and `subject` are not expected to have the same length anymore.
`...`	Additional arguments for methods.
`flip.query.if.wrong.strand`	See the "OverlapEncodings" vignette located in this package (GenomicAlignments).
`x`	For `flipQuery`: a GRangesList object. For `isCompatibleWithSkippedExons`, `extractSteppedExonRanks`, `extractSpannedExonRanks`, and `extractSkippedExonRanks`: an OverlapEncodings object, a factor, or a character vector.
`i`	Subscript specifying the elements in `x` to flip. If missing, all the elements are flipped.
`ovencA`, `ovencB`, `ovenc`	OverlapEncodings objects.
`query.strand`, `subject.strand`	Vector-like objects containing the strand of the query and subject, respectively.
`max.skipped.exons`	Not supported yet. If `NA` (the default), the number of skipped exons must be 1 or more (there is no max).
`for.query.right.end`	If `TRUE`, then the information reported in the output is for the right ends of the paired-end reads. Using `for.query.right.end=TRUE` with single-end reads is an error.

Details

See ?OverlapEncodings for a short introduction to "overlap encodings".

The topic of working with overlap encodings is covered in details in the "OverlapEncodings" vignette located this package (GenomicAlignments) and accessible with vignette("OverlapEncodings").

Value

For encodeOverlaps: An OverlapEncodings object. If hits is not supplied, this object is parallel to the longest of query and subject, that is, it has the length of the longest and the i-th encoding in it corresponds to the i-th element in the longest. If hits is supplied, then the returned object is parallel to it, that is, it has one encoding per hit.

For flipQuery: TODO

For selectEncodingWithCompatibleStrand: TODO

For isCompatibleWithSkippedExons: A logical vector parallel to x.

For extractSteppedExonRanks, extractSpannedExonRanks, and extractSkippedExonRanks: TODO

For extractQueryStartInTranscript: TODO

Author(s)

Hervé Pagès

Examples

## ---------------------------------------------------------------------
## A. BETWEEN 2 IntegerRangesList OBJECTS
## ---------------------------------------------------------------------
## In the context of an RNA-seq experiment, encoding the overlaps
## between 2 GRangesList objects, one containing the reads (the query),
## and one containing the transcripts (the subject), can be used for
## detecting hits between reads and transcripts that are "compatible"
## with the splicing of the transcript. Here we illustrate this with 2
## IntegerRangesList objects, in order to keep things simple:

## 4 aligned reads in the query:
read1 <- IRanges(c(7, 15, 22), c(9, 19, 23))  # 2 junctions
read2 <- IRanges(c(5, 15), c(9, 17))  # 1 junction
read3 <- IRanges(c(16, 22), c(19, 24))  # 1 junction
read4 <- IRanges(c(16, 23), c(19, 24))  # 1 junction
query <- IRangesList(read1, read2, read3, read4)

## 1 transcript in the subject:
tx <- IRanges(c(1, 4, 15, 22, 38), c(2, 9, 19, 25, 47))  # 5 exons
subject <- IRangesList(tx)

## Encode the overlaps:
ovenc <- encodeOverlaps(query, subject)
ovenc
encoding(ovenc)

## ---------------------------------------------------------------------
## B. BETWEEN 2 GRangesList OBJECTS
## ---------------------------------------------------------------------
## With real RNA-seq data, the reads and transcripts will typically be
## stored in GRangesList objects. Please refer to the "OverlapEncodings"
## vignette in this package for realistic examples.

Bioconductor/GenomicAlignments documentation built on June 14, 2025, 5:42 p.m.