matchGenes: Find and annotate closest genes to genomic regions

View source: R/matchGenes.R

matchGenesR Documentation

Find and annotate closest genes to genomic regions

Description

Find and annotate closest genes to genomic regions

Usage

matchGenes(x, subject, type = c("any", "fiveprime"), promoterDist = 2500, skipExons = FALSE, verbose = TRUE)

Arguments

x

An IRanges or GenomicRanges object, or a data.frame with columns for start, end, and, optionally, chr or seqnames.

subject

An GenomicRanges object containing transcripts or genes that have been annotated by the function annotateTranscripts.

promoterDist

Anything within this distance to the transcription start site (TSE) will be considered a promoter.

type

Should the distance be computed to any part of the transcript or the five prime end.

skipExons

Should the annotation of exons be skipped. Skipping this part makes the code slightly faster.

verbose

logical value. If 'TRUE', it writes out some messages indicating progress. If 'FALSE' nothing should be printed.

Details

This function runs nearest and then annotates the the relationship between the region and the transcript/gene that is closest. Many details are provided on this relationship as described in the next section.

Value

A data frame with one row for each query and with columns c("name", "annotation", "description", "region", "distance", "subregion", "insideDistance", "exonnumber", "nexons", "UTR", "strand", "geneL", "codingL","Entrez","subhectHits"). The first column is the _gene_ nearest the query, by virtue of it owning the transcript determined (or chosen by nearest) to be nearest the query. Note that the nearest gene to a given query, in column 3, may not be unique and we arbitrarily chose one as done by default by nearest.

The "distance" column is the distance from the query to the 5' end of the nearest transcript, so may be different from the distance computed by nearest to that transcript, as a range.

name

Symbol of nearest gene

annotation

RefSeq ID

description

a factor with levels c("upstream", "promoter", "overlaps 5'", "inside intron", "inside exon", "covers exon(s)", "overlaps exon upstream", "overlaps exon downstream", "overlaps two exons", "overlaps 3'", "close to 3'", "downstream", "covers")

region

a factor with levels c("upstream", "promoter", "overlaps 5'", "inside", "overlaps 3'", "close to 3'", "downstream", "covers")

distance

distance before 5' end of gene

subregion

a factor with levels c("inside intron", "inside exon", "covers exon(s)", "overlaps exon upstream", "overlaps exon downstream", "overlaps two exons")

insideDistance

distance past 5' end of gene

exonnumber

which exon

nexons

number of exons

UTR

a factor with levels c("inside transcription region", "5' UTR", "overlaps 5' UTR", "3'UTR", "overlaps 3'UTR", "covers transcription region")

strand

"+" or "-"

geneL

the gene length

codingL

the coding length

Entrez

Entrez ID of closest gene

subjectHits

Index in subject of hit

Author(s)

Harris Jaffee, Peter Murakami and Rafael A. Irizarry

See Also

annotateNearest, annotateTranscripts

Examples

## Not run: 
    islands=makeGRangesFromDataFrame(read.delim("http://rafalab.jhsph.edu/CGI/model-based-cpg-islands-hg19.txt")[1:100,])
    library("TxDb.Hsapiens.UCSC.hg19.knownGene")
    genes <- annotateTranscripts(TxDb.Hsapiens.UCSC.hg19.knownGene)
    tab<- matchGenes(islands,genes)

## End(Not run)

rafalab/bumphunter documentation built on March 20, 2024, 6:22 a.m.