To install the development version of inferorg, run
devtools::install_github("camlab-bioml/inferorg")
Load the package:
library(inferorg)
To infer the organism and gene ID format, call the inferorg function. For example, for the human MHC-I genes:
human_mhc_genes <- c("HLA-A", "HLA-B", "HLA-C") inferorg(human_mhc_genes)
This returns a list with four entries:
organism: the best guess of the organism the symbols correspond toformat: the best guess of the format the symbols correspond toconfidence_organism: the confidence in the guess of the organismconfidence_format: the confidence in the formatFor a full list of supported organisms and formats, see supported ID formats and organisms.
Sometimes identifiers match multiple organisms, such as Tap1 and Tap2 both matching mouse and rat. In this case the confidence score is lower, but the recommended organism is first in the preferred order given by supported ID formats and organisms (i.e. human before mouse before fruit fly):
inferorg(c("Tap1", "Tap2"))
The confidence scores for each organism and format as follows:
To convert automatically between formats we use the autoconvert function. Under-the-hood, this calls the inferorg function to work out the gene ID format and organism, before converting to the desired format (for that organism).
For example, if we wish to convert the genes of the human MHC-I complex to ensembl IDs, we can call:
human_mhc_genes <- c("HLA-A", "HLA-B", "HLA-C") autoconvert(human_mhc_genes, to = 'ensgene')
Similarly, we can convert the genes Tap1 and Tap2 in mouse to their entrez IDs:
mouse_tap_genes <- c("Tap1", "Tap2") autoconvert(mouse_tap_genes, to = "entrez")
and we can convert these back to
autoconvert(c(21354, 21355), to = "symbol")
Note that if the gene ID format and/or organism can't be confidently inferred or any of the genes provided can't be confidently mapped, an NA is returned:
autoconvert(c("fake", "gene"))
But be careful! Sometimes they will match, which is especially an issue for very small input genesets:
autoconvert(c("made", "up", "gene"), to='ensgene')
The following organisms are supported:
humanmousefruit_flymacaquewormchickenratand the following gene ID formats:
symbol: HGNC symbolensgene: ensembl gene IDentrez: entrez gene IDprint(sessionInfo())
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.