rangeBasedAccessors: Extract genomic features from an object
In Bioconductor/OrganismDbi: Software to enable the smooth interfacing of different database packages

rangeBasedAccessors

R Documentation

Extract genomic features from an object

Description

Generic functions to extract genomic features from an object. This page documents the methods for OrganismDb objects only.

Usage

## S4 method for signature 'MultiDb'
transcripts(x, columns=c("TXID", "TXNAME"), filter=NULL)

## S4 method for signature 'MultiDb'
exons(x, columns="EXONID", filter=NULL)

## S4 method for signature 'MultiDb'
cds(x, columns="CDSID", filter=NULL)

## S4 method for signature 'MultiDb'
genes(x, columns="GENEID", filter=NULL)

## S4 method for signature 'MultiDb'
transcriptsBy(x, by, columns, use.names=FALSE,
                                     outerMcols=FALSE)

## S4 method for signature 'MultiDb'
exonsBy(x, by, columns, use.names=FALSE, outerMcols=FALSE)

## S4 method for signature 'MultiDb'
cdsBy(x, by, columns, use.names=FALSE, outerMcols=FALSE)

## S4 method for signature 'MultiDb'
getTxDbIfAvailable(x, ...)



## S4 method for signature 'MultiDb'
asBED(x)
## S4 method for signature 'MultiDb'
asGFF(x)

## S4 method for signature 'MultiDb'
tRNAs(x)
## S4 method for signature 'MultiDb'
promoters(x, upstream=2000, downstream=200, use.names=TRUE, ...)

## S4 method for signature 'GenomicRanges,MultiDb'
distance(x, y, ignore.strand=FALSE,
    ..., id, type=c("gene", "tx", "exon", "cds"))

## S4 method for signature 'BSgenome'
extractTranscriptSeqs(x, transcripts, strand = "+")

## S4 method for signature 'MultiDb'
extractUpstreamSeqs(x, genes, width=1000, exclude.seqlevels=NULL)

## S4 method for signature 'MultiDb'
intronsByTranscript(x, use.names=FALSE)
## S4 method for signature 'MultiDb'
fiveUTRsByTranscript(x, use.names=FALSE)
## S4 method for signature 'MultiDb'
threeUTRsByTranscript(x, use.names=FALSE)

## S4 method for signature 'MultiDb'
isActiveSeq(x)

Arguments

`x`	A MultiDb object, except in the extractTranscriptSeqs method where it is a `BSgenome` object and the second argument is a MultiDb object.
`...`	Arguments to be passed to or from methods.
`by`	One of `"gene"`, `"exon"`, `"cds"` or `"tx"`. Determines the grouping.
`columns`	The columns or kinds of metadata that can be retrieved from the database. All possible columns are returned by using the `columns` method.
`filter`	Either `NULL` or a named list of vectors to be used to restrict the output. Valid names for this list are: `"gene_id"`, `"tx_id"`, `"tx_name"`, `"tx_chrom"`, `"tx_strand"`, `"exon_id"`, `"exon_name"`, `"exon_chrom"`, `"exon_strand"`, `"cds_id"`, `"cds_name"`, `"cds_chrom"`, `"cds_strand"` and `"exon_rank"`.
`use.names`	Controls how to set the names of the returned GRangesList object. These functions return all the features of a given type (e.g. all the exons) grouped by another feature type (e.g. grouped by transcript) in a GRangesList object. By default (i.e. if `use.names` is `FALSE`), the names of this GRangesList object (aka the group names) are the internal ids of the features used for grouping (aka the grouping features), which are guaranteed to be unique. If `use.names` is `TRUE`, then the names of the grouping features are used instead of their internal ids. For example, when grouping by transcript (`by="tx"`), the default group names are the transcript internal ids (`"tx_id"`). But, if `use.names=TRUE`, the group names are the transcript names (`"tx_name"`). Note that, unlike the feature ids, the feature names are not guaranteed to be unique or even defined (they could be all `NA`s). A warning is issued when this happens. See `?id2name` for more information about feature internal ids and feature external names and how to map the formers to the latters. Finally, `use.names=TRUE` cannot be used when grouping by gene `by="gene"`. This is because, unlike for the other features, the gene ids are external ids (e.g. Entrez Gene or Ensembl ids) so the db doesn't have a `"gene_name"` column for storing alternate gene names.
`upstream`	For `promoters` : An `integer(1)` value indicating the number of bases upstream from the transcription start site. For additional details see ?`promoters,GRanges-method`.
`downstream`	For `promoters` : An `integer(1)` value indicating the number of bases downstream from the transcription start site. For additional details see ?`promoters,GRanges-method`.
`y`	For `distance`, a MultiDb instance. The `id` is used to extract ranges from the MultiDb which are then used to compute the distance from `x`.
`id`	A `character` vector the same length as `x`. The `id` must be identifiers in the MultiDb object. `type` indicates what type of identifier `id` is.
`type`	A `character(1)` describing the `id`. Must be one of ‘gene’, ‘tx’, ‘exon’ or ‘cds’.
`ignore.strand`	A `logical` indicating if the strand of the ranges should be ignored. When `TRUE`, strand is set to `'+'`.
`outerMcols`	A `logical` indicating if the the 'outer' mcols (metadata columns) should be populated for some range based accesors which return a GRangesList object. By default this is FALSE, but if TRUE then the outer list object will also have it's metadata columns (mcols) populated as well as the mcols for the 'inner' GRanges objects.
`transcripts`	An object representing the exon ranges of each transcript to extract. It must be a GRangesList or MultiDb object while the `x` is a `BSgenome` object. Internally, it's turned into a GRangesList object with `exonsBy(transcripts, by="tx", use.names=TRUE)`.
`strand`	Only supported when `x` is a `DNAString` object. Can be an atomic vector, a factor, or an Rle object, in which case it indicates the strand of each transcript (i.e. all the exons in a transcript are considered to be on the same strand). More precisely: it's turned into a factor (or factor-Rle) that has the "standard strand levels" (this is done by calling the `strand` function on it). Then it's recycled to the length of IntegerRangesList object `transcripts` if needed. In the resulting object, the i-th element is interpreted as the strand of all the exons in the i-th transcript. `strand` can also be a list-like object, in which case it indicates the strand of each exon, individually. Thus it must have the same shape as IntegerRangesList object `transcripts` (i.e. same length plus `strand[[i]]` must have the same length as `transcripts[[i]]` for all `i`). `strand` can only contain `"+"` and/or `"-"` values. `"*"` is not allowed.
`genes`	An object containing the locations (i.e. chromosome name, start, end, and strand) of the genes or transcripts with respect to the reference genome. Only GenomicRanges and MultiDb objects are supported at the moment. If the latter, the gene locations are obtained by calling the `genes` function on the MultiDb object internally.
`width`	How many bases to extract upstream of each TSS (transcription start site).
`exclude.seqlevels`	A character vector containing the chromosome names (a.k.a. sequence levels) to exclude when the genes are obtained from a MultiDb object.

Details

These are the range based functions for extracting transcript information from a MultiDb object.

Value

a GRanges or GRangesList object

Author(s)

M. Carlson

Examples

## extracting all transcripts from Homo.sapiens with some extra metadata
library(Homo.sapiens)
cols = c("TXNAME","SYMBOL")
res <- transcripts(Homo.sapiens, columns=cols)

## extracting all transcripts from Homo.sapiens, grouped by gene and
## with extra metadata
res <- transcriptsBy(Homo.sapiens, by="gene", columns=cols)

## list possible values for columns argument:
columns(Homo.sapiens)

## Get the TxDb from an MultiDb object (if it's available)
getTxDbIfAvailable(Homo.sapiens)

## Other functions listed above should work in way similar to their TxDb
## counterparts.  So for example:
promoters(Homo.sapiens)
## Should give the same value as:
promoters(getTxDbIfAvailable(Homo.sapiens))

Bioconductor/OrganismDbi documentation built on June 14, 2025, 5:45 p.m.