An interface for the ‘Neo4j’ database providing mapping between different identifiers of biological entities. This Biological Entity Dictionary (BED) has been developed to address three main challenges. The first one is related to the completeness of identifier mappings. Indeed, direct mapping information provided by the different systems are not always complete and can be enriched by mappings provided by other resources. More interestingly, direct mappings not identified by any of these resources can be indirectly inferred by using mappings to a third reference. For example, many human Ensembl gene ID are not directly mapped to any Entrez gene ID but such mappings can be inferred using respective mappings to HGNC ID. The second challenge is related to the mapping of deprecated identifiers. Indeed, entity identifiers can change from one resource release to another. The identifier history is provided by some resources, such as Ensembl or the NCBI, but it is generally not used by mapping tools. The third challenge is related to the automation of the mapping process according to the relationships between the biological entities of interest. Indeed, mapping between gene and protein ID scopes should not be done the same way than between two scopes regarding gene ID. Also, converting identifiers from different organisms should be possible using gene orthologs information. A ready to use database is provided as a ‘Docker’ image https://hub.docker.com/r/patzaw/bed-ucb-human/. The method has been published by Godard and van Eyll (2018) .
install.packages("BED")
The following R packages available on CRAN are required:
And those are suggested:
devtools::install_github("patzaw/BED")
If you get an error like the following…
Error: package or namespace load failed for ‘BED’:
.onLoad failed in loadNamespace() for 'BED', details:
call: connections[[connection]][["cache"]]
error: subscript out of bounds
… remove the BED folder located here:
file.exists(file.path(Sys.getenv("HOME"), "R", "BED"))
Documentation is provided in the BED vignette.
A public instance of the BED Neo4j database is provided for convenience and can be reached as follows:
library(BED)
connectToBed("https://genodesy.org/BED/", remember=TRUE, useCache=TRUE)
findBeids()
This package and the underlying research has been published in this peer reviewed article:
An instance of the BED database (UCB-Human) has been built using the script provided in the BED R package.
This instance is focused on Homo sapiens, Mus musculus, Rattus norvegicus, Sus scrofa and Danio rerio organisms. It has been built from the following resources:
The Neo4j graph database is available as a dump file shared in Zenodo.
The following shell commands can be adapted according to user needs and called to get a running Neo4j container with a BED database instance.
#!/bin/sh
####################################################@
## Config ----
export BED_VERSION=2024.01.14
export NJ_VERSION=5.15.0
export BED_HTTP_PORT=5454
export BED_BOLT_PORT=5687
export CONTAINER=bed
export BED_REP_URL=https://zenodo.org/records/10521413/files/
export BED_DUMPS=~/.cache/BED/neo4jDump
export BED_DATA=~/.cache/BED/neo4jData
####################################################@
## Check folders ----
if test -e $BED_DATA; then
echo "$BED_DATA directory exists ==> abort - Remove it before proceeding" >&2
exit
fi
mkdir -p $BED_DATA
if test -e $BED_DUMPS; then
echo "$BED_DUMPS directory exists ==> abort - Remove it before proceeding" >&2
exit
fi
mkdir -p $BED_DUMPS
####################################################@
## Download data ----
wget $BED_REP_URL/dump-bed-ucb-human-$BED_VERSION.dump -O $BED_DUMPS/neo4j.dump
####################################################@
## Import data ----
docker run --interactive --tty --rm \
--volume=$BED_DATA/data:/data \
--volume=$BED_DUMPS:/backups \
neo4j/neo4j-admin:$NJ_VERSION \
neo4j-admin database load neo4j --from-path=/backups
####################################################@
## Start neo4j ----
docker run -d \
--name $CONTAINER \
--publish=$BED_HTTP_PORT:7474 \
--publish=$BED_BOLT_PORT:7687 \
--env=NEO4J_dbms_memory_heap_initial__size=4G \
--env=NEO4J_dbms_memory_heap_max__size=4G \
--env=NEO4J_dbms_memory_pagecache_size=4G \
--env=NEO4J_dbms_read__only=true \
--env=NEO4J_AUTH=none \
--volume $BED_DATA/data:/data \
--volume $BED_DATA/logs:/var/lib/neo4j/logs \
--restart=always \
neo4j:$NJ_VERSION
Building and feeding a BED database instance is achieved using scripts available in the “supp/Build” folder.
Using the S01-NewBED-Container.sh script.
Using the S02-Rebuild-BED.sh script which compile the Rebuild-BED.Rmd document.
Using the S03-Dump-BED.sh script
Sergio Espeso-Gil has reported stability issues with Docker images in Windows. It’s mainly solved by checking the “Use the WSL2 based engine” options in docker settings. More information is provided here: https://docs.docker.com/docker-for-windows/wsl/
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.