transcriptToGenome: Map transcript-relative coordinates to genomic coordinates

View source: R/transcriptToX.R

transcriptToGenomeR Documentation

Map transcript-relative coordinates to genomic coordinates

Description

transcriptToGenome maps transcript-relative coordinates to genomic coordinates. Provided coordinates are expected to be relative to the first nucleotide of the transcript, not the CDS. CDS-relative coordinates have to be converted to transcript-relative positions first with the cdsToTranscript() function.

Usage

transcriptToGenome(x, db, id = "name")

Arguments

x

IRanges with the coordinates within the transcript. Coordinates are counted from the start of the transcript (including the 5' UTR). The Ensembl IDs of the corresponding transcripts have to be provided either as names of the IRanges, or in one of its metadata columns.

db

EnsDb object or pre-loaded exons 'CompressedGRangesList' object using exonsBy().

id

character(1) specifying where the transcript identifier can be found. Has to be either "name" or one of colnames(mcols(prng)).

Value

GRangesList with the same length (and order) than the input IRanges x. Each GRanges in the GRangesList provides the genomic coordinates corresponding to the provided within-transcript coordinates. The original transcript ID and the transcript-relative coordinates are provided as metadata columns as well as the ID of the individual exon(s). An empty GRanges is returned for transcripts that can not be found in the database.

Author(s)

Johannes Rainer

See Also

cdsToTranscript() and transcriptToCds() for the mapping between CDS- and transcript-relative coordinates.

Other coordinate mapping functions: cdsToTranscript(), genomeToProtein(), genomeToTranscript(), proteinToGenome(), proteinToTranscript(), transcriptToCds(), transcriptToProtein()

Examples


library(EnsDb.Hsapiens.v86)
## Restrict all further queries to chromosome x to speed up the examples
edbx <- filter(EnsDb.Hsapiens.v86, filter = ~ seq_name == "X")

## Below we map positions 1 to 5 within the transcript ENST00000381578 to
## the genome. The ID of the transcript has to be provided either as names
## or in one of the IRanges' metadata columns
txpos <- IRanges(start = 1, end = 5, names = "ENST00000381578")

transcriptToGenome(txpos, edbx)
## The object returns a GRangesList with the genomic coordinates, in this
## example the coordinates are within the same exon and map to a single
## genomic region.

## Next we map nucleotides 501 to 505 of ENST00000486554 to the genome
txpos <- IRanges(start = 501, end = 505, names = "ENST00000486554")

transcriptToGenome(txpos, edbx)
## The positions within the transcript are located within two of the
## transcripts exons and thus a `GRanges` of length 2 is returned.

## Next we map multiple regions, two within the same transcript and one
## in a transcript that does not exist.
txpos <- IRanges(start = c(501, 1, 5), end = c(505, 10, 6),
    names = c("ENST00000486554", "ENST00000486554", "some"))

res <- transcriptToGenome(txpos, edbx)

## The length of the result GRangesList has the same length than the
## input IRanges
length(res)

## The result for the last region is an empty GRanges, because the
## transcript could not be found in the database
res[[3]]

res
## If you are tring to map a huge list of transcript-relative coordinates 
## to genomic level, you shall use pre-loaded exons GRangesList to replace 
## the SQLite db edbx

exons <- exonsBy(EnsDb.Hsapiens.v86)

## Below is just a lazy demo of querying 10^4 transcript-relative 
## coordinates without any pre-splitting 
library(parallel)

txpos <-  IRanges(
    start = rep(1,10), 
    end = rep(30,10),
    names = c(rep('ENST00000486554',9),'some'),
    note = rep('something',10))

## only run in Linux ## 
# res_temp <- mclapply(1:10, function(ind){
#     transcriptToGenome(txpos[ind], exons)
# }, mc.preschedule = TRUE, mc.cores = detectCores() - 1)

# res <- do.call(c,res_temp)
cl <- makeCluster(detectCores() - 1)
clusterExport(cl,c('transcriptToGenome','txpos','exons'))
res <- parLapply(cl,1:10,function(ind){
    transcriptToGenome(txpos[ind], exons)
})
stopCluster(cl)

jorainer/ensembldb documentation built on Aug. 23, 2024, 1:16 p.m.