matchGenes: Find and annotate closest genes to genomic regions
In rafalab/bumphunter: Bump Hunter

matchGenes

R Documentation

Find and annotate closest genes to genomic regions

Description

Find and annotate closest genes to genomic regions

Usage

matchGenes(x, subject, type = c("any", "fiveprime"), promoterDist = 2500, skipExons = FALSE, verbose = TRUE)

Arguments

`x`	An `IRanges` or `GenomicRanges` object, or a `data.frame` with columns for `start`, `end`, and, optionally, `chr` or `seqnames`.
`subject`	An `GenomicRanges` object containing transcripts or genes that have been annotated by the function `annotateTranscripts`.
`promoterDist`	Anything within this distance to the transcription start site (TSE) will be considered a promoter.
`type`	Should the distance be computed to any part of the transcript or the five prime end.
`skipExons`	Should the annotation of exons be skipped. Skipping this part makes the code slightly faster.
`verbose`	logical value. If 'TRUE', it writes out some messages indicating progress. If 'FALSE' nothing should be printed.

Details

This function runs nearest and then annotates the the relationship between the region and the transcript/gene that is closest. Many details are provided on this relationship as described in the next section.

Value

A data frame with one row for each query and with columns c("name", "annotation", "description", "region", "distance", "subregion", "insideDistance", "exonnumber", "nexons", "UTR", "strand", "geneL", "codingL","Entrez","subhectHits"). The first column is the _gene_ nearest the query, by virtue of it owning the transcript determined (or chosen by nearest) to be nearest the query. Note that the nearest gene to a given query, in column 3, may not be unique and we arbitrarily chose one as done by default by nearest.

The "distance" column is the distance from the query to the 5' end of the nearest transcript, so may be different from the distance computed by nearest to that transcript, as a range.

`name`	Symbol of nearest gene
`annotation`	RefSeq ID
`description`	a factor with levels `c("upstream", "promoter", "overlaps 5'", "inside intron", "inside exon", "covers exon(s)", "overlaps exon upstream", "overlaps exon downstream", "overlaps two exons", "overlaps 3'", "close to 3'", "downstream", "covers")`
`region`	a factor with levels `c("upstream", "promoter", "overlaps 5'", "inside", "overlaps 3'", "close to 3'", "downstream", "covers")`
`distance`	distance before 5' end of gene
`subregion`	a factor with levels `c("inside intron", "inside exon", "covers exon(s)", "overlaps exon upstream", "overlaps exon downstream", "overlaps two exons")`
`insideDistance`	distance past 5' end of gene
`exonnumber`	which exon
`nexons`	number of exons
`UTR`	a factor with levels `c("inside transcription region", "5' UTR", "overlaps 5' UTR", "3'UTR", "overlaps 3'UTR", "covers transcription region")`
`strand`	"+" or "-"
`geneL`	the gene length
`codingL`	the coding length
`Entrez`	Entrez ID of closest gene
`subjectHits`	Index in subject of hit

Author(s)

Harris Jaffee, Peter Murakami and Rafael A. Irizarry

Examples

## Not run: 
    islands=makeGRangesFromDataFrame(read.delim("http://rafalab.jhsph.edu/CGI/model-based-cpg-islands-hg19.txt")[1:100,])
    library("TxDb.Hsapiens.UCSC.hg19.knownGene")
    genes <- annotateTranscripts(TxDb.Hsapiens.UCSC.hg19.knownGene)
    tab<- matchGenes(islands,genes)

## End(Not run)

rafalab/bumphunter documentation built on March 20, 2024, 6:22 a.m.