knitr::opts_chunk$set(echo = TRUE)
This doc describes how to add resources to AnnotationHub. In general, these instructions pertain to core team members only.
This requires generating two types of resources
On Local Machine:
Navigate to the AnnotationHub_docker directory or create such a directory by following the instructions here.
Start the docker:
export MYSQL_REMOTE_PASSWORD=*** (See credentials doc) sudo docker-compose up
options(AH_SERVER_POST_URL="http://localhost:3000/resource") options(ANNOTATION_HUB_URL="http://localhost:3000") url <- getOption("AH_SERVER_POST_URL") library(AnnotationHubData) # Since this grabs the file and converts on the fly # There is actually no need to run with metadataOnly=FALSE # # This periodically fails. We realized after a certain number # of subsequent hits to ensembl ftp site, the site starts asking # for a username:password or simply blocks entirely # We have tried increasing the sleep time in between failed attempts # with a max retry of 3. Normally this function will run completely # after 1-3 attempts. # In the future maybe consider adding in conditional userpwd argumnet to getURL meta <- updateResources(getwd(), BiocVersion = "3.6", preparerClasses = "EnsemblGtfImportPreparer", metadataOnly = TRUE, insert = FALSE, justRunUnitTest = FALSE, release = "89") # test/check meta pushMetadata(meta, url) # you could rerun updateResources with insert=TRUE to do the push but I like to check resource data
exit R
Convert db to sqlite (puts the file in the data/ directory)
sudo docker exec annotationhub_annotationhub_1 bash /bin/backup_db.sh
On EC2 Instance:
The files will be downloaded, converted and pushed to S3 bucket. This should be done on the EC2 instance val_annotations. If it is not running, start the EC2 instance on AWS and log as user ubuntu.
Because this can take awhile, it is recommended to use the screen application. Some usefule screen calls:
- start screen by typing 'screen' - cd to directory you want to be in, start the process or code you want to run - exit the screen session with 'ctl-a' 'd' - list screen sessions with 'screen -ls' - reconnect to a specific session (e.g., XYZ) with 'screen -r XYZ'
library(AnnotationHubData) # to populate S3 bucket need metadataOnly = FALSE # metadataOnly will control population of S3 bucket # insert controls inserting into database meta <- updateResources(getwd(), BiocVersion = "3.6", preparerClasses = "EnsemblTwoBitPreparer", metadataOnly = FALSE, insert = FALSE, justRunUnitTest = FALSE, release = "89") # a suggested step is to save(meta, file="metadataForTwoBit") # and scp to local machine
On Local Machine:
Navigate to the AnnotationHub_docker directory
Start the docker:
export MYSQL_REMOTE_PASSWORD=*** (See credentials doc) sudo docker-compose up
options(AH_SERVER_POST_URL="http://localhost:3000/resource") options(ANNOTATION_HUB_URL="http://localhost:3000") url <- getOption("AH_SERVER_POST_URL") library(AnnotationHubData) # option 1: meta <- updateResources(getwd(), BiocVersion = "3.6", preparerClasses = "EnsemblTwoBitPreparer", metadataOnly = TRUE, insert = FALSE, justRunUnitTest = FALSE, release = "89") # option 2: # if you saved the meta from the EC2 instance load("metadataForTwoBit") pushMetadata(meta, url) # you could rerun updateResources with insert=TRUE to do the push but I like to check resource data
exit R
Convert db to sqlite (puts the file in the data/ directory)
sudo docker exec annotationhub_annotationhub_1 bash /bin/backup_db.sh
After GTF or 2bits are added to production database, you should be able to get the resources and see the updated timestamp of database with something like the following
library(AnnotationHub) hub = AnnotationHub() length(query(hub, c("ensembl", "gtf", "release-89"))) length(query(hub, c("fasta", "release-89", "twobit")))
Contributors will generally reach out when wanting to update or include Annotations to the AnnotationHub. In the past they have provided the annotations through an application like dropbox; we have since updated the process and now will directly upload files to S3 in a temporary location. Send them the instructions found here
A key will have to be generated for them to access and use the
AnnotationContributor
account. Go to here
Under the 'Security credentials' tab click 'Create access key'. Send the
Access key ID to the contributor and the Secret access key is stored in AWS.
When the contributor is done you can delete the key (small 'x' at the right of
the key row).
Advise that their data should be in a directory the same name as the software package that will access the annotations; subdirectories to keep track of versions is strongly encouraged.
Once the data is uploaded to S3 move the data to the proper location.
We will need a copy of the package to generate and test the annotaitons. Request link to package from user.
Follow instructions here
In general, generate the list of AnnotationHubMetadata objects with
makeAnnotationHubMetadata()
or updateResources
. To test that the
metadata.csv is properly formatted, run makeAnnotationHubMetadata
.
Some suggested testing procedures can be found here
When satisfied start the AnnotationHub docker and add resource to docker.
Navigate to the AnnotationHub_docker directory
Start the docker:
export MYSQL_REMOTE_PASSWORD=*** (See credentials doc) sudo docker-compose up
options(AH_SERVER_POST_URL="http://localhost:3000/resource") options(ANNOTATION_HUB_URL="http://localhost:3000") library(AnnotationHubData) url <- getOption("AH_SERVER_POST_URL") # run approprate makeAnnotationHubMetadata() call and #pushMetadata(meta[[1]], url)
exit R
Test
From the list of dockers running sudo docker ps
find the process with db
in
the name. Example test_db1
. Connect to the container with
sudo docker exec -ti test_db bash
. Log into mysql with mysql -p -u ahuser
(The password will be the same as the exported MYSQL_REMOTE_PASSWORD). Explore
with mysql commands like select * from resources order by id desc limit 5;
Convert db to sqlite (puts the file in the data/ directory) Which command to run to convert the db to sqlite will depend on the name of the process, but it should be one of the followng:
sudo docker exec annotationhub_annotationhub_1 bash /bin/backup_db.sh sudo docker exec annotationhub_docker_annotationhub_1 bash /bin/backup_db.sh
This recipe should be run after the new OrgDb packages have been built for the release are available in the devel repo. The code essentially loads the current packages, extracts the sqlite file and creates some basic metadata.
The BiocVersion should be whatever the next release version will be, the current devel soon to be release. The OrgDb resources get the same name when they are regenerated - they aren't tied to a genome build so that's not a distinguishing feature in the title. We only want 1 OrgDb for each species available in a release and the BiocVersion is what we use to filter records exposed.
On AWS:
Create S3 bucket based on rdatapath: annotationhub/ncbi/standard/
On Local Machine:
Navigate to the AnnotationHub_docker directory
Start the docker:
export MYSQL_REMOTE_PASSWORD=*** (See credentials doc) sudo docker-compose up
options(AH_SERVER_POST_URL="http://localhost:3000/resource") options(ANNOTATION_HUB_URL="http://localhost:3000") url <- getOption("AH_SERVER_POST_URL") library(AnnotationHubData) # see the man page for clarification on testing and actively pushing # ?makeStandardOrgDbsToAHM # to populate S3 bucket need metadataOnly = FALSE # metadataOnly will control population of S3 bucket # insert controls inserting into database meta <- updateResources(getwd(), BiocVersion = "3.5", preparerClasses = "OrgDbFromPkgsImportPreparer", metadataOnly = TRUE, insert = FALSE, justRunUnitTest = FALSE, downloadOrgDbs=TRUE) # downloadOrgDbs can be FALSE for subsequent runs pushMetadata(meta, url)
exit R
Convert db to sqlite (puts the file in the data/ directory)
sudo docker exec annotationhub_annotationhub_1 bash /bin/backup_db.sh
If satisfied, copy this file to annotationhub.bioconductor.org and follow instructions for updating production database
After uploaded to production you can test that they are available in release and devel
query(hub, "OrgDb") table(mcols(query(hub, "OrgDb"))$rdatadateadded)
If before the release occurs, debug(AnnotationHub:::.uid0)
so that query2's biocversion
is the desired test version.
or try setAnnotationHubOption(TESTING=TRUE)
This recipe should be run after the new TxDbs have been built and are in the devel repo. The code loads the packages, extracts the sqlite file and creates metadata.
The BiocVersion should be whatever the next release version will be, the current devel soon to be release. The OrgDb resources get the same name when they are regenerated - they aren't tied to a genome build so that's not a distinguishing feature in the title. We only want 1 OrgDb for each species available in a release and the BiocVersion is what we use to filter records exposed.
On AWS:
Create S3 bucket based on rdatapath: annotationhub/ucsc/standard/
On Local Machine:
Navigate to the AnnotationHub_docker directory
Start the docker:
export MYSQL_REMOTE_PASSWORD=*** (See credentials doc) sudo docker-compose up
options(AH_SERVER_POST_URL="http://localhost:3000/resource") options(ANNOTATION_HUB_URL="http://localhost:3000") url <- getOption("AH_SERVER_POST_URL") library(AnnotationHubData) # see the man page for clarification on testing and actively pushing # follow example in man page # will need the list of updated or added TxDbs to make a character list of files ?makeStandardTxDbsToAHM # # The following will hopefully be implemented in the future # #meta <- updateResources(getwd(), # BiocVersion = "3.5", # preparerClasses = "TxDbFromPkgsImportPreparer", # metadataOnly = TRUE, insert = FALSE, # justRunUnitTest = FALSE, # downloadTxDbs=TRUE) # downloadTxDbs can be FALSE for subsequent runs pushMetadata(meta, url)
exit R
Convert db to sqlite (puts the file in the data/ directory)
sudo docker exec annotationhub_annotationhub_1 bash /bin/backup_db.sh
If satisfied, copy this file to annotationhub.bioconductor.org and follow instructions for updating production database
After uploaded to production you can test that they are available in release and devel
query(hub, "TxDb") table(mcols(query(hub, "TxDb"))$rdatadateadded)
This code generates ~1400 non-standard OrgDb sqlite files from ucsc. These are less comprehensive and the standard OrgDb packages. It's best to run this on an EC2 instance. You can run it locally if your machine has enough space to download the files from NCBI but keep in mind this code takes several hours to run.
The BiocVersion should be whatever the next release version will be, the current devel soon to be release. The OrgDb resources get the same name when they are regenerated - they aren't tied to a genome build so that's not a distinguishing feature in the title. We only want 1 OrgDb for each species available in a release and the BiocVersion is what we use to filter records exposed.
Before running, make sure the AnnotationForge/inst/extdata/viableIDs.rda
and
GenomeInfoDbData/data/specData.rda
files have been updated and pushed in their
respective packages. The scripts to generate these data files are in the packages
inst/scripts
directories as viableIDs.R and updateGenomeInfoDbData.R respectively.
On AWS:
Create S3 bucket: annotationhub/ncbi/uniprot/
On EC2 Instance:
The files will be downloaded, converted and pushed to S3 bucket. This should be done on the EC2 instance val_annotations. If it is not running, start the EC2 instance on AWS and log as user ubuntu.
Because this can take awhile, it is recommended to use the screen application. Some usefule screen calls:
- start screen by typing 'screen' - cd to directory you want to be in, start the process or code you want to run - exit the screen session with 'ctl-a' 'd' - list screen sessions with 'screen -ls' - reconnect to a specific session (e.g., XYZ) with 'screen -r XYZ'
library(AnnotationHubData) # see the man page for clarification on testing and actively pushing # ?makeNCBIToOrgDbsToAHM # to populate S3 bucket need metadataOnly = FALSE # metadataOnly will control population of S3 bucket # insert controls inserting into database meta <- updateResources(getwd(), BiocVersion = "3.5", preparerClasses = "NCBIImportPreparer", metadataOnly = TRUE, insert = FALSE, justRunUnitTest = FALSE) # a suggested step is to save(meta, file="metadataForTwoBit") # and scp to local machine
Note:
We have had issues in the past with the recipe completing for all
desired resources. The receipe, when repeatedly run, will check the
appropriate S3 bucket to compare what resources still need to be
processed. The helper function needToRerunNonStandardOrgDb
can
be run to determine if a repeat call should be made. This is important
because if all the resources are on aws and metadataOnly=FALSE, it will
assume you want to overwrite all the files and begin the generation
over.
On Local Machine:
Navigate to the AnnotationHub_docker directory
Start the docker:
export MYSQL_REMOTE_PASSWORD=*** (See credentials doc) sudo docker-compose up
options(AH_SERVER_POST_URL="http://localhost:3000/resource") options(ANNOTATION_HUB_URL="http://localhost:3000") url <- getOption("AH_SERVER_POST_URL") library(AnnotationHubData) # option 1: meta <- updateResources(getwd(), BiocVersion = "3.5", preparerClasses = "NCBIImportPreparer", metadataOnly = TRUE, insert = FALSE, justRunUnitTest = FALSE) # option 2: # if you saved the meta from the EC2 instance load("metadataForTwoBit") pushMetadata(meta, url)
exit R
Convert db to sqlite (puts the file in the data/ directory)
sudo docker exec annotationhub_annotationhub_1 bash /bin/backup_db.sh
ExperimentHub Resources are added upon request or when it is recommended a package be an Experiment Data Package rather than Software package. The package that will use such data will reach out. It is then a similar process to The AnnotationHubData adding contributor resources.
The user will upload files to S3 in a temporary location. Send them the instructions found here
A key will have to be generated for them to access and use the
AnnotationContributor
account. Go to here
Under the 'Security credentials' tab click 'Create access key'. Send the
Access key ID to the contributor and the Secret access key is stored in AWS.
When the contributor is done you can delete the key (small 'x' at the right of
the key row).
Advise that their data should be in a directory the same name as the software
package that will use the Experiment Data when uploading (ie. software package
Test
would upload files to S3 in a folder Test
-> Test\file1
,
Test\file2
, etc). If subdirectories are needed that is okay but ensure the
RDataPath
in the metadata.csv reflects this structure.
Once the data is uploaded to S3 move the data to the proper location.
We will need a copy of the package to generate and test the annotaitons. Request link to package from user. The following should be sent to user to ensure the package is sent up correctly: instructions.
In general, generate the list of ExperimentHubMetadata objects with
makeExperimentHubMetadata()
or addResources
. To test that the
metadata.csv is properly formatted,run makeAnnotationHubMetadata
.
Info on ExperimentHub docker and how to set up docker directory.
When satisfied start the ExperimentHub docker and add resource to docker.
Navigate to the ExperimentHub_docker directory
Start the docker:
export MYSQL_REMOTE_PASSWORD=*** (See credentials doc) sudo docker-compose up
options(EXPERIMENT_HUB_SERVER_POST_URL="http://localhost:4000/resource") options(EXPERIMENT_HUB_URL="http://localhost:4000") library(ExperimentHubData) url <- getOption("EXPERIMENT_HUB_SERVER_POST_URL") # run approprate makeExperimentHubMetadata() call and following if necessary #pushMetadata(meta, url)
exit R
Test
From the list of dockers running sudo docker ps
find the process with db
in
the name. Example test_db1
. Connect to the container with
sudo docker exec -ti test_db bash
. Log into mysql with mysql -p -u hubuser
(The password will be the same as the exported MYSQL_REMOTE_PASSWORD). Explore
with mysql commands like select * from resources order by id desc limit 5;
Convert db to sqlite (puts the file in the data/ directory) Which command to run to convert the db to sqlite will depend on the name of the process, but it should be one of following:
sudo docker exec experimenthubdocker_experimenthub_1 bash /bin/backup_db.sh sudo docker exec experimenthub_docker_experimenthub_1 bash /bin/backup_db.sh
Some other Notes and helpful hints:
If a new recipe needed to be added, the recipe is added in AnnotationHub. Be sure to then update the version of AnnotationHub dependency in the DESCRIPTION of ExperimentHub and increase the version in ExperimentHub.
Remember when an ExperimentHub package is accepted it is uploaded to a different repo.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.