transcriptsBy: Extract and group genomic features of a given type from a...

transcriptsByR Documentation

Extract and group genomic features of a given type from a TxDb-like object

Description

Generic functions to extract genomic features of a given type grouped based on another type of genomic feature. This page documents the methods for TxDb objects only.

Usage

transcriptsBy(x, by=c("gene", "exon", "cds"), ...)
## S4 method for signature 'TxDb'
transcriptsBy(x, by=c("gene", "exon", "cds"), use.names=FALSE)

exonsBy(x, by=c("tx", "gene"), ...)
## S4 method for signature 'TxDb'
exonsBy(x, by=c("tx", "gene"), use.names=FALSE)

cdsBy(x, by=c("tx", "gene"), ...)
## S4 method for signature 'TxDb'
cdsBy(x, by=c("tx", "gene"), use.names=FALSE)

intronsByTranscript(x, ...)
## S4 method for signature 'TxDb'
intronsByTranscript(x, use.names=FALSE)

fiveUTRsByTranscript(x, ...)
## S4 method for signature 'TxDb'
fiveUTRsByTranscript(x, use.names=FALSE)

threeUTRsByTranscript(x, ...)
## S4 method for signature 'TxDb'
threeUTRsByTranscript(x, use.names=FALSE)

Arguments

x

A TxDb object.

...

Arguments to be passed to or from methods.

by

One of "gene", "exon", "cds" or "tx". Determines the grouping.

use.names

Controls how to set the names of the returned GRangesList object. These functions return all the features of a given type (e.g. all the exons) grouped by another feature type (e.g. grouped by transcript) in a GRangesList object. By default (i.e. if use.names is FALSE), the names of this GRangesList object (aka the group names) are the internal ids of the features used for grouping (aka the grouping features), which are guaranteed to be unique. If use.names is TRUE, then the names of the grouping features are used instead of their internal ids. For example, when grouping by transcript (by="tx"), the default group names are the transcript internal ids ("tx_id"). But, if use.names=TRUE, the group names are the transcript names ("tx_name"). Note that, unlike the feature ids, the feature names are not guaranteed to be unique or even defined (they could be all NAs). A warning is issued when this happens. See ?id2name for more information about feature internal ids and feature external names and how to map the formers to the latters.

Finally, use.names=TRUE cannot be used when grouping by gene by="gene". This is because, unlike for the other features, the gene ids are external ids (e.g. Entrez Gene or Ensembl ids) so the db doesn't have a "gene_name" column for storing alternate gene names.

Details

These functions return a GRangesList object where the ranges within each of the elements are ordered according to the following rule:

When using exonsBy or cdsBy with by="tx", the returned exons or CDS parts are ordered by ascending rank for each transcript, that is, by their position in the transcript. In all other cases, the ranges will be ordered by chromosome, strand, start, and end values.

Value

A GRangesList object.

Author(s)

M. Carlson, P. Aboyoun and H. Pagès

See Also

  • transcripts and transcriptsByOverlaps for more ways to extract genomic features from a TxDb-like object.

  • transcriptLengths for extracting the transcript lengths (and other metrics) from a TxDb object.

  • exonicParts and intronicParts for extracting non-overlapping exonic or intronic parts from a TxDb-like object.

  • extendExonsIntoIntrons for extending exons into their adjacent introns.

  • extractTranscriptSeqs for extracting transcript (or CDS) sequences from chromosome sequences.

  • coverageByTranscript for computing coverage by transcript (or CDS) of a set of ranges.

  • select-methods for how to use the simple "select" interface to extract information from a TxDb object.

  • id2name for mapping TxDb internal ids to external names for a given feature type.

  • The TxDb class.

Examples

txdb_file <- system.file("extdata", "hg19_knownGene_sample.sqlite",
                         package="GenomicFeatures")
txdb <- loadDb(txdb_file)

## Extract the transcripts grouped by gene:
transcriptsBy(txdb, "gene")

## Extract the exons grouped by gene:
exonsBy(txdb, "gene")

## Extract the CDS parts grouped by transcript:
cds_by_tx0 <- cdsBy(txdb, "tx")
## With more informative group names:
cds_by_tx1 <- cdsBy(txdb, "tx", use.names=TRUE)
## Note that 'cds_by_tx1' can also be obtained with:
names(cds_by_tx0) <- id2name(txdb, feature.type="tx")[names(cds_by_tx0)]
stopifnot(identical(cds_by_tx0, cds_by_tx1))

## Extract the introns grouped by transcript:
intronsByTranscript(txdb)

## Extract the 5' UTRs grouped by transcript:
fiveUTRsByTranscript(txdb)
fiveUTRsByTranscript(txdb, use.names=TRUE)  # more informative group names

Bioconductor/GenomicFeatures documentation built on Sept. 30, 2024, 2:09 p.m.