getlength: Retrieves Gene length data

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/getlength.R


Gets the length of each gene in a vector.


getlength(genes, genome, id)



A vector or list of the genes for which length information is required.


A string identifying the genome that genes refer to. For a list of supported organisms run supportedGenomes.


A string identifying the gene identifier used by genes. For a list of supported gene IDs run supportedGeneIDs.


Length data is obtained from data obtained from the UCSC genome browser for each combination of genome and id. As fetching this data at runtime is time consuming, a local copy of the length information for common genomes and gene ID are included in the geneLenDataBase package. This function uses this package to fetch the required data.

The length of a gene is taken to be the median length of all its mature, mRNA, transcripts. It is always preferable to obtain length information directly for the gene ID used to summarize your count data, rather than converting IDs and then using the supplied databases. Even when two genes have a one-to-one mapping between different identifier conventions (which is often not the case), they frequently refer to slightly different regions of the genome with different lengths. It is therefore recommended that the user perform the full analysis in terms of only one gene ID, or manually obtain their own length data for the identifier used to bin reads by gene.


Returns a vector of the gene lengths, in the same order as genes. If length data is unavailable for a particular gene NA is returned in that position. The returned vector is intended for use with the option of the nullp function.


Matthew D. Young [email protected]

See Also

supportedGenomes, supportedGeneIDs, nullp, geneLenDataBase


genes <- c("ENSG00000124208", "ENSG00000182463", "ENSG00000124201", "ENSG00000124205", "ENSG00000124207")

goseq documentation built on Jan. 5, 2019, 7 p.m.