gene_aliases: Gene names, symbols and IDs from HGNC, Entrez and Ensembl

gene_aliasesR Documentation

Gene names, symbols and IDs from HGNC, Entrez and Ensembl

Description

A table of gene ids, symbols, aliases, previous aliases, and names from the Human Genome Naming Consortium, NCBI (Entrez) and Ensembl, corresponding to genome build GRCh38. Data is primarily based on the HGNC gene groups and protein-coding genes tables. Additional, unambiguous gene name aliases from Ensembl (fetched via Bioconductor package biomaRt) and NCBI (fetched via the NCBI ftp site and the Bioconductor package org.Hs.eg.db) have been added. Mappings between HGNC, Ensembl and NCBI (Entrez) IDs are mostly based on HGNC, with some corrections of obsolete Ensembl IDs using the Ensembl data. Ambiguous aliases, i.e. aliases shared by more than on gene, have been removed. Some ambiguous aliases may be protein names, but others are abbreviations with different meanings.

Aliases, previous aliases and names have been split to contain one entry per row, with an additional "symbol_type" column giving the source of the symbol (e.g. HGNC_SYMBOL, alias_symbol). The source tables have been filtered by BIOTYPE to remove pseudogenes, read-through genes, RNA genes, mitochondrial genes and genes of unknown biotype. Only genes located on chromosomes were fetched from Ensembl, i.e. not haplotypes or patches, with the result that some genes do not have Ensembl IDs.

Usage

gene_aliases

Format

A data frame with 131533 rows and 10 variables:

HGNC_ID

HGNC gene IDs

ENSEMBL_ID

Ensembl gene ID, from HGNC

UNIPROT_ID

UNIPROT ID, from HGNC

HGNC_SYMBOL

HGNC gene symbol

ENTREZ_ID

ENTREZ (NCBI gene) ID, from HGNC

BIOTYPE

Type of gene, usually from HGNC

symbol_type

Source of the "value" column, e.g. "HGNC_SYMBOL", "HGNC_NAME"

value

A gene symbol, symbol alias, or name

ALT_ID

HGNC ID or other relevant ID for a specific protein modification, isoform or carbohydrate. If no stable ID was found, Antigen / Clone combination is used.

SOURCE

Source of the data.

Source

https://www.genenames.org/cgi-bin/genegroup/download-all

http://ftp.ebi.ac.uk/pub/databases/genenames/hgnc/tsv/locus_types/gene_with_protein_product.txt


HelenLindsay/AbNames documentation built on June 6, 2023, 1:18 p.m.