knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "README-" )
{homologiser} provides a simple route to obtain to obtain the gene IDs for the
homologues of a set of genes. It calls {biomaRt}. It provides one exported
function (map_to_homologues
) that should be used for cross-species ID
mapping.
But using {biomaRt} is not that difficult, so why write a wrapper around it that has a really narrow purpose?
A) {homologiser} is simpler
B) you might (I frequently do) need to disregard those genes that have no
homologues, or multiple homologues in the target species, or that map to a gene
that has multiple homologues in the source species: to restrict to just these
genes, you can use map_to_homologues(blah, blah, ..., one_to_one = TRUE)
.
This is a basic example which shows you how to solve a common problem:
# Let's face it, you're all pinning your analyses to a specific database # for reproducibility, aren't you? ensembl_v84 <- "http://Mar2016.archive.ensembl.org" human_biomart <- biomaRt::useMart( biomart = "ensembl", host = ensembl_v84, dataset = "hsapiens_gene_ensembl" ) mouse_biomart <- biomaRt::useMart( biomart = "ensembl", host = ensembl_v84, dataset = "mmusculus_gene_ensembl" )
# A selection of human IDs human_genes <- c("ENSG00000134294", "ENSG00000284192", "ENSG00000002726") # Get the ensembl IDs for mouse homologues: homologiser::map_to_homologues( gene_ids = human_genes, dataset_sp1 = human_biomart, sp1 = "hsapiens", idtype_sp1 = "ensembl_gene_id", dataset_sp2 = mouse_biomart, sp2 = "mmusculus", idtype_sp2 = "ensembl_gene_id" )
Note that
"ENSGxxxxxx2726" maps to several mouse genes
"ENSGxxxx134294" maps to a single mouse gene
"ENSGxxxx284192" maps to no mouse genes
although that function-call was a bit wordy, human -> mouse is the default
direction and ensembl-gene is the default ID-type, so we only really needed
to type map_to_homologues(human_genes, human_biomart, mouse_biomart)
What if we only want to consider those homologue pairings where there is a single human gene mapping to/from a single mouse gene:
# Get the ensembl IDs for mouse homologues: homologiser::map_to_homologues( gene_ids = human_genes, dataset_sp1 = human_biomart, dataset_sp2 = mouse_biomart, one_to_one = TRUE )
Now, since x2726 (one-to-many) and x284192 (one-to-zero) don't map one-to-one, they have a missing value in the returned data-frame.
What if a gene is part of a set of genes that map many-to-one? For example, this is one of the mouse genes that "ENSGxx2726" maps to:
mouse_gene <- "ENSMUSG00000029811" homologiser::map_to_homologues( gene_ids = mouse_gene, dataset_sp1 = mouse_biomart, sp1 = "mmusculus", dataset_sp2 = human_biomart, sp2 = "hsapiens", one_to_one = FALSE ) homologiser::map_to_homologues( gene_ids = mouse_gene, dataset_sp1 = mouse_biomart, sp1 = "mmusculus", dataset_sp2 = human_biomart, sp2 = "hsapiens", one_to_one = TRUE )
Since that mouse gene is part of a many-to-one mapping, it does not have any homology partners when we restrict to one-to-one mappings (but it's human homologue is included when we are less strict).
Note that you can use either "ensembl_gene_id" or "entrezgene" as the "idtype"
# ensembl-mouse to entrez-human homologiser::map_to_homologues( gene_ids = mouse_gene, dataset_sp1 = mouse_biomart, sp1 = "mmusculus", dataset_sp2 = human_biomart, sp2 = "hsapiens", idtype_sp2 = "entrezgene", one_to_one = FALSE ) # entrez-human to ensembl-mouse homologiser::map_to_homologues( gene_ids = c("10000", "1234"), # AKT3 and CCR5 dataset_sp1 = human_biomart, idtype_sp1 = "entrezgene", dataset_sp2 = mouse_biomart, one_to_one = TRUE )
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.