transcriptLengths function extracts the transcript lengths from
a TxDb object. It also returns the CDS and UTR lengths for each
transcript if the user requests them.
A TxDb object.
Additional arguments used by
All the lengths are counted in number of nucleotides.
The length of a processed transcript is just the sum of the lengths of its
exons. This should not be confounded with the length of the stretch of DNA
transcribed into RNA (a.k.a. transcription unit), which can be obtained
A data frame with 1 row per transcript. The rows are guaranteed to be in
the same order as the elements of the GRanges object
The data frame has between 5 and 8 columns, depending on what the user
requested via the
The first 3 columns are the same as the metadata columns of the object returned by
tx_id: The internal transcript ID. This ID is unique within
the scope of the TxDb object. It is not an official or public
ID (like an Ensembl or FlyBase ID) or an Accession number, so it
cannot be used to lookup the transcript in public data bases or in
other TxDb objects. Furthermore, this ID could change when
re-running the code that was used to make the TxDb object.
tx_name: An official/public transcript name or ID that can
be used to lookup the transcript in public data bases or in other
TxDb objects. This column is not guaranteed to contain unique
values and it can contain NAs.
gene_id: The official/public ID of the gene that the
transcript belongs to. Can be NA if the gene is unknown or if the
transcript is not considered to belong to a gene.
The other columns are quantitative:
nexon: The number of exons in the transcript.
tx_len: The length of the processed transcript.
cds_len: [optional] The length of the CDS region of the
utr5_len: [optional] The length of the 5' UTR region of
the processed transcript.
utr3_len: [optional] The length of the 3' UTR region of
the processed transcript.
transcriptsByOverlaps, for extracting
genomic feature locations from a TxDb-like object.
extracting non-overlapping exonic or intronic parts from a
extractTranscriptSeqs for extracting transcript
(or CDS) sequences from chromosome sequences.
coverageByTranscript for computing coverage by
transcript (or CDS) of a set of ranges.
makeTxDbFromEnsembl, for making a TxDb
object from online resources.
for making a TxDb object from a GRanges
object, or from a GFF or GTF file.
The TxDb class.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
library(TxDb.Dmelanogaster.UCSC.dm3.ensGene) txdb <- TxDb.Dmelanogaster.UCSC.dm3.ensGene dm3_txlens <- transcriptLengths(txdb) head(dm3_txlens) dm3_txlens <- transcriptLengths(txdb, with.cds_len=TRUE, with.utr5_len=TRUE, with.utr3_len=TRUE) head(dm3_txlens) ## When cds_len is 0 (non-coding transcript), utr5_len and utr3_len ## must also be 0: non_coding <- dm3_txlens[dm3_txlens$cds_len == 0, ] stopifnot(all(non_coding[6:8] == 0)) ## When cds_len is not 0 (coding transcript), cds_len + utr5_len + ## utr3_len must be equal to tx_len: coding <- dm3_txlens[dm3_txlens$cds_len != 0, ] stopifnot(all(rowSums(coding[6:8]) == coding[])) ## A sanity check: stopifnot(identical(dm3_txlens$tx_id, mcols(transcripts(txdb))$tx_id))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.