Description Usage Arguments Details Value Explanation of fields Author(s) Examples
Add detailed exon-based annotation to specified genomic regions.
1 2 | detailRanges(incoming, txdb, orgdb, dist=5000, promoter=c(3000, 1000),
key.field="ENTREZID", name.field="SYMBOL", ignore.strand=TRUE)
|
incoming |
A GRanges object containing the ranges to be annotated. |
txdb |
A TxDb object for the genome of interest. |
orgdb |
An OrgDb object for the genome of interest. |
dist |
An integer scalar specifying the flanking distance to annotate. |
promoter |
An integer vector of length 2, where first and second values define the promoter as some distance upstream and downstream from the TSS, respectively. |
key.field |
A character scalar specifying the key type in |
name.field |
A character scalar specifying the column from |
ignore.strand |
A logical scalar indicating whether strandedness in |
This function adds annotations to a given set of genomic regions in the form of compact character strings specifying the features overlapping and flanking each region.
The aim is to determine the genic context of empirically identified regions, for some basic biological interpretation of binding/marking in those regions.
All neighboring genes within a specified range are reported, rather than just the closest gene to the region.
If a region in incoming
is stranded and ignore.strand=FALSE
, annotated features will only be reported if they lie on the same strand as that region.
If incoming
is missing, then the annotation will be provided directly to the user in the form of a GRanges object.
This may be more useful when further work on the annotation is required.
Features are labelled as exons ("E"
), promoters ("P"
) or gene bodies ("G"
).
Overlaps to introns can be identified by finding those regions that overlap with gene bodies but not with any of the corresponding exons.
The default settings for key.field
and name.field
will work for human and mouse genomes, but may not work for other organisms.
The key.field
should specify the key type in the orgdb
object that corresponds to the gene IDs of the txdb
object.
For example, in S. cerevisiae, key.field
is set to "ORF"
to match the gene IDs in the corresponding TxDb object,
while name.field
is set to "GENENAME"
to obtain the gene symbols.
If incoming
is not provided, a GRanges object will be returned containing ranges for the exons, promoters and gene bodies.
Gene keys (e.g., Entrez IDs) are povided as row names.
Gene symbols and feature types are stored as metadata.
If incoming
is a GRanges object, a list will be returned with overlap
, left
and right
elements.
Each element is a character vector of length equal to the number of ranges in incoming
.
Each non-empty string records the gene symbol, the overlapped exons and the strand.
For left
and right
, the gap between the range and the annotated feature is also included.
For annotated features overlapping a region, the character string in the overlap
output vector will be of the form GENE:STRAND:TYPE
.
GENE
is the gene symbol by default, but reverts to the key (default Entrez ID) if no symbol is defined.
STRAND
is simply the strand of the gene, either "+"
or "-"
.
The TYPE
indicates the feature types that are overlapped - exon ("E"
), promoter ("P"
) and/or intron ("I"
).
Note that intron overlaps are only reported if the region does not overlap an exon directly.
For annotated features flanking the region within a distance of dist
, the TYPE
is instead the distance to the feature.
This represents the gap between the edge of the region and the closest exon for that gene.
Flanking promoters are not reported, as it is more informative to report the distance to the exon directly;
and flanking an intron should be impossible without overlapping an exon directly (and thus should not be reported, see above).
Note that exons directly overlapping the supplied region are not considered for flanking annotation, as the distance would be negative.
The strand information is often useful in conjunction with the left/right flanking features. For example, if an exon for a negative-strand gene is to the left, the current region must be upstream of that exon. Conversely, if the exon for a positive-strand gene is to the left, the region must be downstream. The opposite applies for features to the right of the current region.
Aaron Lun
1 2 3 4 5 6 7 8 9 10 |
library(org.Mm.eg.db)
library(TxDb.Mmusculus.UCSC.mm10.knownGene)
current <- readRDS(system.file("exdata", "exrange.rds", package="csaw"))
output <- detailRanges(current, orgdb=org.Mm.eg.db,
txdb=TxDb.Mmusculus.UCSC.mm10.knownGene)
head(output$overlap)
head(output$right)
head(output$left)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.