transcripts: Extract genomic features from a TxDb-like object

transcriptsR Documentation

Extract genomic features from a TxDb-like object

Description

Generic functions to extract genomic features from a TxDb-like object. This page documents the methods for TxDb objects only.

Usage

transcripts(x, ...)
## S4 method for signature 'TxDb'
transcripts(x, columns=c("tx_id", "tx_name"), filter=NULL, use.names=FALSE)

exons(x, ...)
## S4 method for signature 'TxDb'
exons(x, columns="exon_id", filter=NULL, use.names=FALSE)

cds(x, ...)
## S4 method for signature 'TxDb'
cds(x, columns="cds_id", filter=NULL, use.names=FALSE)

genes(x, ...)
## S4 method for signature 'TxDb'
genes(x, columns="gene_id", filter=NULL, single.strand.genes.only=TRUE)

## S4 method for signature 'TxDb'
promoters(x, upstream=2000, downstream=200, use.names=TRUE, ...)
## S4 method for signature 'TxDb'
terminators(x, upstream=2000, downstream=200, use.names=TRUE, ...)

Arguments

x

A TxDb object.

...

For the transcripts(), exons(), cds(), and genes() generic functions: arguments to be passed to methods.

For the promoters() and terminators() methods for TxDb objects: arguments to be passed to the internal call to transcripts().

columns

Columns to include in the output. Must be NULL or a character vector as given by the columns method. With the following restrictions:

  • "TXCHROM" and "TXSTRAND" are not allowed for transcripts().

  • "EXONCHROM" and "EXONSTRAND" are not allowed for exons().

  • "CDSCHROM" and "CDSSTRAND" are not allowed for cds().

If the vector is named, those names are used for the corresponding column in the element metadata of the returned object.

filter

Either NULL or a named list of vectors to be used to restrict the output. Valid names for this list are: "gene_id", "tx_id", "tx_name", "tx_chrom", "tx_strand", "exon_id", "exon_name", "exon_chrom", "exon_strand", "cds_id", "cds_name", "cds_chrom", "cds_strand" and "exon_rank".

use.names

TRUE or FALSE. If TRUE, the feature names are set as the names of the returned object, with NAs being replaced with empty strings.

single.strand.genes.only

TRUE or FALSE. If TRUE (the default), then genes are returned in a GRanges object and those genes that cannot be represented by a single genomic range (because they have exons located on both strands of the same reference sequence or on more than one reference sequence) are dropped with a message.

If FALSE, then all the genes are returned in a GRangesList object with the columns specified thru the columns argument set as top level metadata columns. (Please keep in mind that the top level metadata columns of a GRangesList object are not displayed by the show() method.)

upstream, downstream

For promoters(): Single integer values indicating the number of bases upstream and downstream from the TSS (transcription start sites).

For terminators(): Single integer values indicating the number of bases upstream and downstream from the TES (transcription end sites).

For additional details see ?GenomicRanges::promoters in the GenomicRanges package.

Details

These are the main functions for extracting features from a TxDb-like object. Note that cds() extracts the bulk CDS parts stored in x, that is, the CDS regions associated with exons. It is often more useful to extract them grouped by transcript with cdsBy().

To restrict the output based on interval information, use the transcriptsByOverlaps(), exonsByOverlaps(), or cdsByOverlaps() function.

The promoters() and terminators() functions compute user-defined promoter or terminator regions for the transcripts in a TxDb-like object. The returned object is a GRanges with one range per transcript in the TxDb-like object. Each range represents the promoter or terminator region associated with a transcript, that is, the region around the TSS (transcription start site) or TES (transcription end site) the span of which is defined by upstream and downstream.

For additional details on how the promoter and terminator ranges are computed and the handling of + and - strands see ?GenomicRanges::promoters in the GenomicRanges package.

Value

A GRanges object. The only exception being when genes() is used with single.strand.genes.only=FALSE, in which case a GRangesList object is returned.

Author(s)

M. Carlson, P. Aboyoun and H. Pagès

See Also

  • transcriptsBy and transcriptsByOverlaps for more ways to extract genomic features from a TxDb-like object.

  • transcriptLengths for extracting the transcript lengths (and other metrics) from a TxDb object.

  • exonicParts and intronicParts for extracting non-overlapping exonic or intronic parts from a TxDb-like object.

  • extendExonsIntoIntrons for extending exons into their adjacent introns.

  • extractTranscriptSeqs for extracting transcript (or CDS) sequences from reference sequences.

  • getPromoterSeq for extracting gene promoter or terminator sequences.

  • coverageByTranscript for computing coverage by transcript (or CDS) of a set of ranges.

  • select-methods for how to use the simple "select" interface to extract information from a TxDb object.

  • microRNAs and tRNAs for extracting microRNA or tRNA genomic ranges from a TxDb object.

  • id2name for mapping TxDb internal ids to external names for a given feature type.

  • The TxDb class.

Examples

txdb_file <- system.file("extdata", "hg19_knownGene_sample.sqlite",
                         package="GenomicFeatures")
txdb <- loadDb(txdb_file)

## ---------------------------------------------------------------------
## transcripts()
## ---------------------------------------------------------------------

tx1 <- transcripts(txdb)
tx1

transcripts(txdb, use.names=TRUE)
transcripts(txdb, columns=NULL, use.names=TRUE)

filter <- list(tx_chrom = c("chr3", "chr5"), tx_strand = "+")
tx2 <- transcripts(txdb, filter=filter)
tx2

## Sanity checks:
stopifnot(
  identical(mcols(tx1)$tx_id, seq_along(tx1)),
  identical(tx2, tx1[seqnames(tx1) == "chr3" & strand(tx1) == "+"])
)

## ---------------------------------------------------------------------
## exons()
## ---------------------------------------------------------------------

exons(txdb, columns=c("EXONID", "TXNAME"),
            filter=list(exon_id=1))
exons(txdb, columns=c("EXONID", "TXNAME"),
            filter=list(tx_name="uc009vip.1"))

## ---------------------------------------------------------------------
## genes()
## ---------------------------------------------------------------------

genes(txdb)  # a GRanges object
cols <- c("tx_id", "tx_chrom", "tx_strand",
          "exon_id", "exon_chrom", "exon_strand")
## By default, genes are returned in a GRanges object and those that
## cannot be represented by a single genomic range (because they have
## exons located on both strands of the same reference sequence or on
## more than one reference sequence) are dropped with a message:
single_strand_genes <- genes(txdb, columns=cols)

## Because we've returned single strand genes only, the "tx_chrom"
## and "exon_chrom" metadata columns are guaranteed to match
## 'seqnames(single_strand_genes)':
stopifnot(identical(as.character(seqnames(single_strand_genes)),
                    as.character(mcols(single_strand_genes)$tx_chrom)))
stopifnot(identical(as.character(seqnames(single_strand_genes)),
                    as.character(mcols(single_strand_genes)$exon_chrom)))

## and also the "tx_strand" and "exon_strand" metadata columns are
## guaranteed to match 'strand(single_strand_genes)':
stopifnot(identical(as.character(strand(single_strand_genes)),
                    as.character(mcols(single_strand_genes)$tx_strand)))
stopifnot(identical(as.character(strand(single_strand_genes)),
                    as.character(mcols(single_strand_genes)$exon_strand)))

all_genes <- genes(txdb, columns=cols, single.strand.genes.only=FALSE)
all_genes  # a GRangesList object
multiple_strand_genes <- all_genes[elementNROWS(all_genes) >= 2]
multiple_strand_genes
mcols(multiple_strand_genes)

## ---------------------------------------------------------------------
## promoters() and terminators()
## ---------------------------------------------------------------------

## This:
promoters(txdb, upstream=100, downstream=50)
## is equivalent to:
tx <- transcripts(txdb, use.names=TRUE)
promoters(tx, upstream=100, downstream=50)

## And this:
terminators(txdb, upstream=100, downstream=50)
## is equivalent to:
terminators(tx, upstream=100, downstream=50)

## Extra arguments are passed to transcripts(). So this:
columns <- c("tx_name", "gene_id")
promoters(txdb, upstream=100, downstream=50, columns=columns)
## is equivalent to:
promoters(transcripts(txdb, columns=columns, use.names=TRUE),
          upstream=100, downstream=50)

Bioconductor/GenomicFeatures documentation built on March 14, 2024, 6:16 a.m.