genomeToProtein: Map genomic coordinates to protein coordinates
In jorainer/ensembldb: Utilities to create and use Ensembl-based annotation databases

genomeToProtein

R Documentation

Map genomic coordinates to protein coordinates

Description

Map positions along the genome to positions within the protein sequence if a protein is encoded at the location. The provided coordinates have to be completely within the genomic position of an exon of a protein coding transcript (see genomeToTranscript() for details). Also, the provided positions have to be within the genomic region encoding the CDS of a transcript (excluding its stop codon; soo transcriptToProtein() for details).

For genomic positions for which the mapping failed an IRanges with negative coordinates (i.e. a start position of -1) is returned.

Usage

genomeToProtein(x, db, proteins = NA, exons = NA, transcripts = NA)

Arguments

`x`	`GRanges` with the genomic coordinates that should be mapped to within-protein coordinates.
`db`	`EnsDb` object.
`proteins`	`DFrame` object generated by `proteins()`.
`exons`	`CompressedGRangesList` object generated by `exonsBy()` where by = 'tx'.
`transcripts`	`GRanges` object generated by `transcripts()`.

Details

genomeToProtein combines calls to genomeToTranscript() and transcriptToProtein().

Value

An IRangesList with each element representing the mapping of one of the GRanges in x (i.e. the length of the IRangesList is length(x)). Each element in IRanges provides the coordinates within the protein sequence, names being the (Ensembl) IDs of the protein. The ID of the transcript encoding the protein, the ID of the exon within which the genomic coordinates are located and its rank in the transcript are provided in metadata columns "tx_id", "exon_id" and "exon_rank". Metadata columns "cds_ok" indicates whether the length of the CDS matches the length of the encoded protein. Coordinates for which cds_ok = FALSE should be taken with caution, as they might not be correct. Metadata columns "seq_start", "seq_end", "seq_name" and "seq_strand" provide the provided genomic coordinates.

For genomic coordinates that can not be mapped to within-protein sequences an IRanges with a start coordinate of -1 is returned.

Author(s)

Johannes Rainer

Examples


library(EnsDb.Hsapiens.v86)
## Restrict all further queries to chromosome x to speed up the examples
edbx <- filter(EnsDb.Hsapiens.v86, filter = ~ seq_name == "X")

## In the example below we define 4 genomic regions:
## 630898: corresponds to the first nt of the CDS of ENST00000381578
## 644636: last nt of the CDS of ENST00000381578
## 644633: last nt before the stop codon in ENST00000381578
## 634829: position within an intron.
gnm <- GRanges("X", IRanges(start = c(630898, 644636, 644633, 634829),
    width = c(5, 1, 1, 3)))
res <- genomeToProtein(gnm, edbx)

## The result is an IRangesList with the same length as gnm
length(res)
length(gnm)

## The first element represents the mapping for the first GRanges:
## the coordinate is mapped to the first amino acid of the protein(s).
## The genomic coordinates can be mapped to several transcripts (and hence
## proteins).
res[[1]]

## The stop codon is not translated, thus the mapping for the second
## GRanges fails
res[[2]]

## The 3rd GRanges is mapped to the last amino acid.
res[[3]]

## Mapping of intronic positions fail
res[[4]]

## Meanwhile, this function can be called in parallel processes if you preload
## the protein, exons and transcripts database.

proteins <- proteins(edbx)
exons <- exonsBy(edbx)
transcripts <- transcripts(edbx)

genomeToProtein(gnm, edbx, proteins = proteins, exons = exons, transcripts = transcripts)

jorainer/ensembldb documentation built on Aug. 23, 2024, 1:16 p.m.

jorainer/ensembldb index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

jorainer/ensembldb
Utilities to create and use Ensembl-based annotation databases

genomeToProtein: Map genomic coordinates to protein coordinates
In jorainer/ensembldb: Utilities to create and use Ensembl-based annotation databases

Map genomic coordinates to protein coordinates

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Related to genomeToProtein in jorainer/ensembldb...

R Package Documentation

Browse R Packages

We want your feedback!

jorainer/ensembldb Utilities to create and use Ensembl-based annotation databases

genomeToProtein: Map genomic coordinates to protein coordinates In jorainer/ensembldb: Utilities to create and use Ensembl-based annotation databases

Map genomic coordinates to protein coordinates

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Related to genomeToProtein in jorainer/ensembldb...

R Package Documentation

Browse R Packages

We want your feedback!

jorainer/ensembldb
Utilities to create and use Ensembl-based annotation databases

genomeToProtein: Map genomic coordinates to protein coordinates
In jorainer/ensembldb: Utilities to create and use Ensembl-based annotation databases