promoterRegions: Generate Annotation for Promoter Regions of Genes

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/promoterRegions.R

Description

Create a SAF data-frame of genewise promoter regions.

Usage

1
2
3
4
5
promoterRegions(

    annotation = "mm10",
    upstream = 3000L,
    downstream = 2000L)

Arguments

annotation

a data.frame containing gene annotation in SAF format or a character string giving the name of a genome with built-in annotation. If using built-in annotation, the character string should be one of the following: mm10, mm9, hg38 or hg19 corresponding to the NCBI RefSeq annotations for the genomes ‘mm10’, ‘mm9’, ‘hg38’ and ‘hg19’, respectively.

upstream

an integer giving the number of upstream bases that will be inclued in the promoter region generated for each gene. These bases are taken immediately upstream (5' end) from transcriptional start site of each gene.

downstream

an integer giving the number of downstream bases that will be inclued in the promoter region generated for each gene. These bases are taken immediately downstream (3' end) from transcriptional start site of each gene.

Details

This function takes as input a SAF format gene annotation and produces a SAF format data.frame containing the chromosomal coordinates of the specified promoter region for each gene. See featureCounts for definition of the SAF format.

Regardless of the upstream or downstream values, the downstream end of the region never extends past the end of the gene and the upstream end never extends outside the relevant chromosome. Setting downstream to an infinite or large value will cause the body of each gene to be included.

Value

A SAF format data.frame with columns GeneID, Chr, Start, End and Strand.

Author(s)

Gordon K Smyth

See Also

featureCounts, getInBuiltAnnotation

Examples

1
2
3
# To get whole gene bodies for the mouse genome:
x <- promoterRegions("mm10", upstream = 0, downstream = Inf)
head(x)

Example output

NCBI RefSeq annotation for mm10 (build 38.1).
      GeneID  Chr   Start     End Strand
1     497097 chr1 3214482 3671498      -
4  100503874 chr1 3647309 3658904      -
6  100038431 chr1 3680155 3681788      +
7      19888 chr1 4290846 4409241      -
13     20671 chr1 4490928 4496413      -
18     27395 chr1 4773198 4785726      -

Rsubread documentation built on March 17, 2021, 6:01 p.m.