View source: R/defineRegions.R
defineRegions | R Documentation |
Use gene, transcript and exon annotations to define genomic regions
defineRegions(
genes,
transcripts,
exons,
promoter = c(2500, 500),
upstream = 5000,
intron = TRUE,
proximal = 10000,
simplify = FALSE,
cols = c("gene_id", "gene_name", "transcript_id", "transcript_name"),
...
)
genes , transcripts , exons |
GRanges objects defining each level of annotation |
promoter |
Numeric vector defining upstream and/or downstream distances for a promoter. Passing a single value will define a symmetrical promoter The first value represents the upstream range |
upstream |
The distance from a TSS defining an upstream promoter |
intron |
logical(1) Separate gene bodies into introns and exons. If
|
proximal |
Distance from a gene to be considered a proximal intergenic region. If set to 0, intergenic regions will simply be considered as uniformly intergenic |
simplify |
Passed internally to |
cols |
Column names to be retained from the supplied annotations |
... |
Not used |
Using GRanges annotated as genes, transcripts and exons this function will define ranges uniquely assigned to a region type using a hierarchical process. By default, these region types will be (in order) 1) Promoters, 2) Upstream Promoters, 3) Exons, 4) Introns, 5) Proximal Intergenic and 6) Distal Intergenic.
Setting intron = FALSE
will replace introns and exons with a generic "Gene
Body" annotation.
Setting proximal = 0
will return all intergenic regions (not previously
annotated as promoters or upstream promoters) to an "Intergenic" category
Notably, once a region has been defined, it is excluded from all subsequent candidate regions.
Any columns matching the names provided in cols will be returned, and it is
assumed that the gene/transcript/exon ranges will contain informative columns
in the mcols()
element.
A GRangesList
## Define two exons for two transcripts
sq <- Seqinfo(seqnames = "chr1", seqlengths = 50000)
e <- c("chr1:20001-21000", "chr1:29001-29950", "chr1:22001-23000", "chr1:29001-30000")
e <- GRanges(e, seqinfo = sq)
mcols(e) <- DataFrame(
gene_id = "Gene1", transcript_id = paste0("Trans", c(1, 1, 2, 2))
)
## Define the transcript ranges
t <- unlist(endoapply(split(e, e$transcript_id), range))
t$gene_id <- "Gene1"
t$transcript_id <- names(t)
names(t) <- NULL
## Summarise to gene level
g <- reduceMC(t)
g$transcript_id <- NA_character_
## Now annotate the regions
regions <- defineRegions(genes = g, transcripts = t, exons = e)
sort(unlist(regions))
## Alternatively, collpse gene body and intergenic ranges
regions <- defineRegions(
genes = g, transcripts = t, exons = e, intron = FALSE, proximal = 0
)
sort(unlist(regions))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.