It is frequently necessary to map the abbreviated gene name, such as act-1 for actin, into its unique identifier, or accession number, in a given database such as WormBase. This vignette shows how to query the database by the gene name and retrieve other IDs associated with a given gene or set of genes.
knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(ParasiteXML) library(biomaRt)
Go to https://parasite.wormbase.org/biomart to design your query. Note: The "martview/71228acdfe812347da" is automatically assigned after loading the page.
These are the fields of the database you are using to select the data.
The image below shows six general categories: SPECIES, REGION, GENE, GENE ONTOLOGY (GO), HOMOLOGY (ORTHOLOGUES AND PARALOGUES), and PROTEIN DOMAINS. We will be using SPECIES and GENE in order to translate IDs.
Be sure to select C. elegans here, because some IDs are reused in other species.
These are some gene names to translate. You can expand this list in R.
The data you are requesting from WormBase ParaSite. These will be all of the other types of IDs available. Clicking on 2. Output Attributes --> Gene reveals the following:
Notice that we also selected gene name as output. This is important because the input query
Now, the selected query will translate the gene name to all of the selected alternate names.
Clicking on the XML button, the query is formulated from the web input.
The output is the following text:
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE Query> <Query virtualSchemaName = "parasite_mart" formatter = "TSV" header = "0" uniqueRows = "0" count = "" datasetConfigVersion = "0.6" > <Dataset name = "wbps_gene" interface = "default" > <Filter name = "gene_name" value = "act-1,des-2,pha-4"/> <Filter name = "species_id_1010" value = "caelegprjna13758"/> <Attribute name = "production_name_1010" /> <Attribute name = "wbps_gene_id" /> <Attribute name = "external_gene_id" /> <Attribute name = "embl" /> <Attribute name = "entrezgene_id" /> <Attribute name = "entrezgene_name" /> </Dataset> </Query>
Use a single quote because the XML text contains double quotes. Otherwise, you will have multiple syntax errors.
query = '<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE Query> <Query virtualSchemaName = "parasite_mart" formatter = "TSV" header = "0" uniqueRows = "0" count = "" datasetConfigVersion = "0.6" > <Dataset name = "wbps_gene" interface = "default" > <Filter name = "gene_name" value = "act-1,des-2,pha-4"/> <Filter name = "species_id_1010" value = "caelegprjna13758"/> <Attribute name = "production_name_1010" /> <Attribute name = "wbps_gene_id" /> <Attribute name = "external_gene_id" /> <Attribute name = "embl" /> <Attribute name = "entrezgene_id" /> <Attribute name = "entrezgene_name" /> </Dataset> </Query>'
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.