proteinToGenome | R Documentation |
proteinToGenome
is a generic function for mapping
ranges of protein-relative positions to the genome.
NOTE: This man page is for the proteinToGenome
S4 generic
function and methods defined in the GenomicFeatures package,
which are (loosely) modeled on the proteinToGenome
function from the ensembldb package.
See ?ensembldb::proteinToGenome
for the latter.
## S4 generic function:
proteinToGenome(x, db, ...) # dispatch is on 2nd argument 'db'
## S4 method for signature 'ANY'
proteinToGenome(x, db)
## S4 method for signature 'GRangesList'
proteinToGenome(x, db)
x |
A named IRanges object (or derivative) containing ranges of protein-relative positions (protein-relative positions are positions relative to a protein sequence). The names on mcols(transcripts(db, columns="tx_name"))$tx_name And for the method for GRangesList objects,
names(db) |
db |
For the default For the method for GRangesList objects: A named GRangesList object (or derivative) where each list element is a GRanges object representing a CDS (the ranges in the GRanges object must represent the CDS parts ordered by ascending exon rank). |
... |
Further arguments to be passed to specific methods. |
The proteinToGenome()
method for GRangesList
objects is the workhorse behind the default method. Note that the latter
is a thin wrapper around the former, which simply does the following:
Use cdsBy()
to extract the CDS parts from db
.
The CDS parts are returned in a GRangesList
object that has the names of the transcript on it (one transcript
name per list element).
Call proteinToGenome()
on x
and the
GRangesList object returned by
cdsBy()
.
A named GRangesList object parallel to
x
(the transcript names on x
are propagated).
The i-th list element in the returned object is the result of mapping
the range of protein-relative positions x[i]
to the genome.
Note that a given range in x
can only be mapped to the genome
if the name on it is the name of a coding transcript. If it's
not (i.e. if it's the name of a non-coding transcript), then
an empty GRanges object is placed in the returned
object to indicate the impossible mapping, and a warning is issued.
Otherwise, if a given range in x
can be mapped to the
genome, then the result of the mapping is represented by a
non-empty GRanges object.
Note that this object represents the original CDS associated to
x
, trimmed on its 5' end or 3' end, or on both.
Furthermore, this object will have the same metadata columns as the
GRanges object representing the original CDS,
plus the 2 following ones:
protein_start
: The protein-relative start of the mapping.
protein_end
: The protein-relative end of the mapping.
Unlike ensembldb::proteinToGenome()
which
can work either with Ensembl protein IDs or Ensembl transcript IDs
on x
, the default proteinToGenome()
method described
above only accepts transcript names on x
.
This means that, if the user is in possession of protein IDs, they must first replace them with the corresponding transcript IDs (referred to as transcript names in the context of TxDb objects). How to do this exactly depends on the origin of those IDs (UCSC, Ensembl, GTF/GFF3 file, FlyBase, etc...)
H. Pagès, using ensembldb::proteinToGenome()
for
inspiration and design.
The proteinToGenome
function in the
ensembldb package, which the proteinToGenome()
generic and methods documented in this man page are (loosely)
modeled on.
TxDb objects.
EnsDb objects (TxDb-like objects) in the ensembldb package.
transcripts
for extracting transcripts from a
TxDb-like object.
cdsBy
for extracting CDS parts from a
TxDb-like object.
IRanges objects in the IRanges package.
GRanges and GRangesList objects in the GenomicRanges package.
## ---------------------------------------------------------------------
## USING TOY CDS
## ---------------------------------------------------------------------
## CDS1 has 2 CDS parts:
CDS1 <- GRanges(c("chrX:11-60:+", "chrX:101-125:+"))
## CDS2 has 3 CDS parts:
CDS2 <- GRanges(c("chrY:201-230:-", "chrY:101-125:-", "chrY:11-60:-"))
## Put them in a GRangesList object:
cds_by_tx <- GRangesList(TX1=CDS1, TX2=CDS2)
cds_by_tx
x1 <- IRanges(start=8, end=20, names="TX1")
proteinToGenome(x1, cds_by_tx)
x2 <- IRanges(start=c(1, 18), end=c(25, 20), names=c("TX1", "TX1"))
x2
proteinToGenome(x2, cds_by_tx)
x3 <- IRanges(start=8, end=15, names="TX2")
proteinToGenome(x3, cds_by_tx)
x4 <- c(x3, x2)
x4
proteinToGenome(x4, cds_by_tx)
## ---------------------------------------------------------------------
## USING A TxDb OBJECT
## ---------------------------------------------------------------------
library(TxDb.Dmelanogaster.UCSC.dm3.ensGene)
txdb <- TxDb.Dmelanogaster.UCSC.dm3.ensGene
## The first transcript (FBtr0309810) is non-coding:
x <- IRanges(c(FBtr0309810="11-55", FBtr0306539="90-300"))
res <- proteinToGenome(x, txdb)
res
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.