knitr::opts_chunk$set(echo = TRUE)
library(knitr)
library(badger)
library(magrittr)
devtools::load_all()

homologene

Build Status codecov r badge_cran_release('homologene',color = '#32BD36') r badge_devel("oganm/homologene", "blue")

An r package that works as a wrapper to homologene

Available species are

homologene::taxData

Installation

install.packages('homologene')

or

devtools::install_github('oganm/homologene')

Usage

Basic homologene function requires a list of gene symbols or NCBI ids, and an inTax and an outTax. In this example, inTax is the taxon id of mus musculus while outTax is for humans.

homologene(c('Eno2','Mog'), inTax = 10090, outTax = 9606)

homologene(c('Eno2','17441'), inTax = 10090, outTax = 9606)

For mouse and humans two convenience functions exist that removes the need to provide taxonomic identifiers. Note that the column names are not the same as the homologene output.

mouse2human(c('Eno2','Mog'))
human2mouse(c('ENO2','MOG','GZMH'))

homologeneData2

Original homologene database has not been updated since 2014. This package also includes an updated version of the homologene database that replaces gene symbols and identifiers with the their latest version. For the procedure followed for updating, see this blog post and/or see the processing code.

Using the updated version can help you match genes that cannot matched due to out of date annotations.

mouse2human(c('Mesd',
              'Trp53rka',
              'Cstdc4',
              'Ifit3b'))


mouse2human(c('Mesd',
              'Trp53rka',
              'Cstdc4',
              'Ifit3b'),
            db = homologeneData2)

The homologeneData2 object that comes with the GitHub version of this package is updated weekly but if you are using the CRAN version and want the latest annotations, or if you want to keep a frozen version homologene, you can use the updateHomologene function.

homologeneDataVeryNew = updateHomologene() # update the homologene database with the latest identifiers

mouse2human(c('Mesd',
              'Trp53rka',
              'Cstdc4',
              'Ifit3b'),
            db = homologeneDataVeryNew)

Gene ID syncronization

The package also includes functions that were used to create the homologeneData2, for updating outdated gene symbols and identifiers.

library(dplyr)

gene_history = getGeneHistory()
oldIds = c(4340964, 4349034, 4332470, 4334151, 4323831)
newIds = updateIDs(oldIds,gene_history)
print(newIds)
# get the latest gene symbols for the ids

gene_info = getGeneInfo()

gene_info %>%
    dplyr::filter(GeneID %in% as.integer(newIds)) # faster to match integers

Querying DIOPT

Instead of using just homologene, one can also make queries into the DIOPT database. Diopt uses multiple databases to find gene homolog/orthologues. Note that this function has a delay parameter that is set to 10 seconds by default. This was done to obey the robots.txt of their website.

diopt(c('GZMH'),inTax = 9606, outTax = 10090) %>% 
    knitr::kable()

diopt(c('Eno2','Mog'),inTax = 10090, outTax =9606) %>%
    knitr::kable()

Mishaps

As of version version 1.1.68, the output now includes NCBI ids. Since it doesn't change any of the existing column names or their order, this shouldn't cause problems in most use cases.

If a you can't find a gene you are looking for it may have synonyms. See geneSynonym package to find them. If you have other problems open an issue or send a mail.



oganm/homologene documentation built on Nov. 5, 2023, 8:39 a.m.