alias2Symbol: Convert Gene Aliases to Official Gene Symbols

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/alias2Symbol.R

Description

Maps gene alias names to official gene symbols.

Usage

1
2
3
4
alias2Symbol(alias, species = "Hs", expand.symbols = FALSE)
alias2SymbolTable(alias, species = "Hs")
alias2SymbolUsingNCBI(alias, gene.info.file,
                      required.columns = c("GeneID","Symbol","description"))

Arguments

alias

character vector of gene aliases

species

character string specifying the species. Possible values include "Hs" (human), "Mm" (mouse), "Rn" (rat), "Dm" (fly) or "Pt" (chimpanzee), but other values are possible if the corresponding organism package is available.

expand.symbols

logical. This affects those elements of alias that are the official gene symbol for one gene and also an alias for another gene. If FALSE, then these elements will just return themselves. If TRUE, then all the genes for which they are aliases will also be returned.

gene.info.file

either the name of a gene information file downloaded from the NCBI or a data.frame resulting from reading such a file.

required.columns

character vector of columns from the gene information file that are required in the output.

Details

Aliases are mapped via NCBI Entrez Gene identity numbers using Bioconductor organism packages.

alias2Symbol maps a set of aliases to a set of symbols, without necessarily preserving order. The output vector may be longer or shorter than the original vector, because some aliases might not be found and some aliases may map to more than one symbol.

alias2SymbolTable returns of vector of the same length as the vector of aliases. If an alias maps to more than one symbol, then the one with the lowest Entrez ID number is returned. If an alias can't be mapped, then NA is returned.

species can be any character string XX for which an organism package org.XX.eg.db exists and is installed. The only requirement of the organism package is that it contains objects org.XX.egALIAS2EG and org.XX.egSYMBOL linking the aliases and symbols to Entrez Gene Ids. At the time of writing, the following organism packages are available from Bioconductor 3.6:

Package Species
org.Ag.eg.db Anopheles
org.Bt.eg.db Bovine
org.Ce.eg.db Worm
org.Cf.eg.db Canine
org.Dm.eg.db Fly
org.Dr.eg.db Zebrafish
org.EcK12.eg.db E coli strain K12
org.EcSakai.eg.db E coli strain Sakai
org.Gg.eg.db Chicken
org.Hs.eg.db Human
org.Mm.eg.db Mouse
org.Mmu.eg.db Rhesus
org.Pt.eg.db Chimp
org.Rn.eg.db Rat
org.Ss.eg.db Pig
org.Xl.eg.db Xenopus

alias2SymbolUsingNCBI is analogous to alias2SymbolTable but uses a gene-info file from NCBI instead of a Bioconductor organism package. It also gives the option of returning multiple columns from the gene-info file. NCBI gene-info files can be downloaded from ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/GENE_INFO. For example, the human file is ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/GENE_INFO/Mammalia/Homo_sapiens.gene_info.gz and the mouse file is ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/GENE_INFO/Mammalia/Mus_musculus.gene_info.gz.

Value

alias2Symbol and alias2SymbolTable produce a character vector of gene symbols. alias2SymbolTable returns a vector of the same length and order as alias, including NA values where no gene symbol was found. alias2Symbol returns an unordered vector that may be longer or shorter than alias.

alias2SymbolUsingNCBI returns a data.frame with rows corresponding to the entries of alias and columns as specified by required.columns.

Author(s)

Gordon Smyth and Yifang Hu

See Also

This function is often used to assist gene set testing, see 10.GeneSetTests.

Examples

1
2
alias2Symbol(c("PUMA","NOXA","BIM"), species="Hs")
alias2Symbol("RS1", expand=TRUE)

Example output

[1] "BCL2L11" "BBC3"    "PMAIP1" 
[1] "RS1"    "RSC1A1"

limma documentation built on Nov. 8, 2020, 8:28 p.m.