getGeneLengthAndGCContent: Get gene length and GC-content

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/getLengthAndGC.R

Description

Automatically retrieves gene length and GC-content information from Biomart or org.db packages.

Usage

1
getGeneLengthAndGCContent(id, org, mode=c("biomart", "org.db"))

Arguments

id

Character vector of one or more ENSEMBL or ENTREZ gene IDs.

org

Organism three letter code, e.g. 'hsa' for 'Homo sapiens'. See also: http://www.genome.jp/kegg/catalog/org_list.html; In org.db mode, this can be also a specific genome assembly, e.g. 'hg38' or 'sacCer3'.

mode

Mode to retrieve the information. Defaults to 'biomart'. See Details.

Details

The 'biomart' mode is based on functionality from the biomaRt packgage and retrieves the required information from the BioMart database. This is available for all ENSEMBL organisms and is typically most current, but can be time-consuming when querying several thousand genes at a time.

The 'org.db' mode uses organism-based annotation packages from Bioconductor. This is much faster than the 'biomart' mode, but is only available for selected model organism currently supported by BioC annotation functionality.

Results for the same gene ID(s) can differ between both modes as they are based on different sources for the underlying genome assembly. While the 'biomart' mode uses the latest ENSEMBL version, the 'org.db' mode uses BioC annotation packages typically built from UCSC.

Value

A numeric matrix with two columns: gene length and GC-content.

Author(s)

Ludwig Geistlinger <Ludwig.Geistlinger@bio.ifi.lmu.de>

See Also

getSequence to retrieve a genomic sequence from BioMart, genes to extract genomic coordinates from a TxDb object, getSeq to extract genomic sequences from a BSgenome object, alphabetFrequency to calculate nucleotide frequencies.

Examples

1
getGeneLengthAndGCContent("ENSG00000012048", "hsa")

EDASeq documentation built on Nov. 8, 2020, 8:29 p.m.