#install.packages("devtools", repos = "https://cran.rstudio.com/")
library("devtools")
install_github("hms-dbmi/sandboxR", force = TRUE)
library(sandboxR)
lsf.str("package:sandboxR")
Note that if you run this function in a juopyterhub environment, it will return a url since jupyterhub doesn't have access to your local browser.
search.dbgap("Jackson")
phs.version("286")
is.parent("000286") # JHS main cohort
is.parent("phs499") # substudy "CARe" for JHS
parent.study("phs000499")
sub.study("286") # note here that the substudy "TOPMed" is missing because it has not been fully integrated yet
study.name("286")
browse.dbgap("286")
browse.study("286")
JHS <- "phs000286"
consent.groups(JHS)
n.pop(JHS)
n.pop(JHS, consentgroups = FALSE)
n.tables(JHS)
n.variables(JHS)
tablesdict <- datatables.dict(JHS)
head(tablesdict)
vardict <- variables.dict(JHS)
head(vardict)
Now that we have explore our datasets, let's use sandboxR in order to clean our variables, and to gather them into a tree that will be easier to use for researchers. Note that for chapter 3, we will need to move and create a lot of files on your environment. It will be easier to use on your local computer than in the Jupyterhub environment.
In order to get your data from dbGap, you will need to request an access and to get a decryption key. This has to be done here: https://dbgap.ncbi.nlm.nih.gov/aa/wga.cgi?login=&page=login
We found that the decryption system from dbGap can be tricky. We created dbgap.decrypt() in order to easily decrypt the files that you have downloaded. Note that the "files" argument can be a file or a folder containing multiple encrypted files. Also, this function works only for Mac OS at this moment.
key <- "path/to/your/key.ngc"
files <- "path/to/the/files/you/want/to/decrypt.ncbi_enc"
dbgap.decrypt(file, key)
Once the dbgap files decrypted, you will have one folder per consent groups containing one file per datatable. The goal of this function is to create a folder with a "0_map.csv" file who will map all your variables, and a subfolder "study_tree" containing one .csv file per variable in your study, gathered by datatables.
cg <- c("path/to/the/first/folder/containing/a/consent/group", "second/folder", ...)
destination <- "path/were/your/tree/will/be/located"
sandbox(JHS, cg, destination)
Once your first tree has been created, you can easily modify the arrangement of your variables by creating new subdirectories, and by dragging and dropping your variable files. You can also change the name of your directories and variables. Be careful not to delete a variable file in this process.
Then, use the function TreeToMap() in order to reflect your modifications in your "0_map.csv" file.
path <- "Pathway/to/the/folder/where/the/map/and/the/tree/are/located"
TreeToMap(path)
Similarly, you can modify the name of your files directly in the "0_map.csv" files. Modify the 5th column "data_label" to change the name of your variables. Use then the MapToTree() function in order to reflect your modifications in your tree.
MapToTree(path)
Each time you will use TreeToMap(), the old map will be saved with a time stamp in a hidden folder called ".oldmaps". Use the function list.oldmaps() in order to list your previous maps. Use look.oldmaps() in order to view one of these maps as a data.frame. Use recover.map() to change your tree and your "0_map.csv" according to one of your previous maps.
list.oldmaps(path)
look.oldmaps(path, "olmap YYYY-MM-DD HHMM AM")
recover.map(path, "olmap YYYY-MM-DD HHMM AM")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.