ranges2annot: Hierarchical annotation of genomic regions.

ranges2annotR Documentation

Hierarchical annotation of genomic regions.

Description

Assigns region types such as promoter, exon or unknown to genomic regions such as CTSS, tag clusters, or consensus clusters.

Usage

ranges2annot(ranges, annot, upstream = 500, downstream = 500)

Arguments

ranges

A GenomicRanges::GRanges object, for example extracted from a RangedSummarizedExperiment object with the rowRanges command.

annot

A GRanges from which promoter positions will be inferred. Typically GENCODE. If the type metadata is present, it should contain gene, exon and transcript among its values. Otherwise, all entries are considered transcripts. If the transcript_type metadata is available, the entries that may not be primary products (for instance ‘snoRNA’) are discarded.

upstream

Number of bases upstream the start of the transcript models to be considered as part of the promoter region.

downstream

Number of bases downstream the start of the transcript models to be considered as part of the promoter region.

Details

Only the biotypes that are likely to have a pol II promoter will be filtered in. This is currently hardcoded in the function; see its source code. Example of biotypes without a pol II promoter: VDJ segments, miRNA, but also snoRNA, etc. Thus, the Intergenic category displayed in output of the plotAnnot may include counts overlaping with real exons of discarded transcribed regions: be careful that large percentages do not necessarly suggest abundance of novel promoters.

Value

A Run-length-encoded (Rle) factor of same length as the CTSS object, indicating if the interval is promoter, exon, intron or unknown, or just promoter, gene, unknown if the type metadata is absent.

Author(s)

Charles Plessy

See Also

CTSScoordinatesGR, exampleZv9_annot

Other CAGEr annotation functions: annotateCTSS(), plotAnnot(), ranges2genes(), ranges2names()

Examples

CAGEr:::ranges2annot(CTSScoordinatesGR(exampleCAGEexp), exampleZv9_annot)

ctss <- GenomicRanges::GRanges("chr1", IRanges::IPos(c(1,100,200,1500)), "+")
ctss <- GenomicRanges::GPos(ctss, stitch = FALSE)
ctss <- as(ctss, "CTSS")
gr1   <- GenomicRanges::GRanges( "chr1"
                               , IRanges::IRanges(c(650, 650, 1400), 2000), "+")
CAGEr:::ranges2annot(ctss, gr1)
gr2 <- gr1
gr2$type            <- c("transcript",     "exon",           "transcript")
gr2$transcript_type <- c("protein_coding", "protein_coding", "miRNA")
CAGEr:::ranges2annot(ctss, gr2, up=500, down=20)


charles-plessy/CAGEr documentation built on Oct. 27, 2024, 10:11 p.m.