knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" )
cloudos R package makes it easy to interact with Lifebit's CloudOS platform in an R environment.
You can install the latest release of cloudos from:
install.packages("cloudos")
```{shell, eval=FALSE} conda install -c conda-forge r-cloudos
+ [GitHub](https://github.com/lifebit-ai/cloudos/): ```r if (!require(remotes)) { install.packages("remotes") } remotes::install_github("lifebit-ai/cloudos")
Alternatively, you can install the latest development version of cloudos: ```{shell, eval=FALSE} git clone https://github.com/lifebit-ai/cloudos cd cloudos git checkout origin/devel Rscript -e 'devtools::install(".")'
## Usage Below is a demonstration of how the **cloudos** package can be used. ### Load the library ```r library(cloudos) library(knitr) # For better visualization of wide dataframes in this README examples library(magrittr) # For pipe
This package is primarily a means of communicating with a CloudOS instance using it's API. Before it can communicate with the CloudOS instance, the package must be configured with some key information:
- The CloudOS base URL. This is the URL in your browser when you navigate to the Cohort Browser in CloudOS. Often of the form https://my_instance.lifebit.ai/app/cohort-browser
.
- The CloudOS token. Navigate to settings page in CloudOS to generate an API key you can use as your token (see image below).
- The CloudOS team ID. Also found in the settings page in CloudOS labelled as the "Workspace ID" (see image below).
The package will look for this information in the following locations in this order:
CLOUDOS_BASEURL
, CLOUDOS_TOKEN
, and CLOUDOS_TEAMID
.There are three ways to configure the package:
~/.Renviron
in the following way, which will load the environment variables on beginning of the R-sessionCLOUDOS_BASEURL="xxx" CLOUDOS_TOKEN="xxx" CLOUDOS_TEAMID="xxx"
Sys.setenv(ENV_VAR = "env_var_value")
Sys.setenv(CLOUDOS_BASEURL = "xxx") Sys.setenv(CLOUDOS_TOKEN = "xxx") Sys.setenv(CLOUDOS_TEAMID = "xxx")
cloudos_configure()
, which will create a ~/.cloudos/config
that will persist between R sessions and be read from each time (Recommended way if you are using multiple cloudos clients).cloudos_configure(base_url = "xxx", token = "xxx", team_id = "xxx")
Below information is out of date, please refer to the latest function docs.
Cohort Browser is part of Lifebit's CloudOS offering. Let's explore how to interact with this in R environment.
To check list of available cohorts in a workspace.
cohorts <- cb_list_cohorts() cohorts %>% head(n=5) %>% kable()
To create a new cohort.
my_cohort <- cb_create_cohort(cohort_name = "Cohort-R", cohort_desc = "This cohort is for testing purpose, created from R.") my_cohort
Get a available cohort in to a cohort R object. This cohort object can be used in many different other functions.
other_cohort <- cb_load_cohort(cohort_id = "610ac00edb7c7a1d9d0c309f") other_cohort
Search for phenotypes based on a term. Searching with term = ""
will return all the available phenotypes.
disease_phenotypes <- cb_search_phenotypes(term = "disease") disease_phenotypes %>% head(n=5) %>% kable()
Let's choose a phenotype from the above table. The "id" is the most important part as it will allow us to use this phenotype for cohort queries and other functions.
# get the first row/phenotype in the table my_phenotype <- disease_phenotypes[5,] my_phenotype %>% kable()
Let's check the numbers of participants across the categories of this phenotype.
# phenotype my_pheno_data <- cb_get_phenotype_statistics(cohort = my_cohort, pheno_id = my_phenotype$id) my_pheno_data %>% head(n=10) %>% kable()
A query defines what particpants are included in a cohort based on phenotypes.
Phenotypes can be continuous - in which case a selected range needs to be specified, or they can be categorical - in which case selected categories need to be specified.
For phenotype "Year of birth" (with id = 8)
# cb_get_phenotype_metadata(8)$name # "Year of birth" cb_get_phenotype_statistics(cohort = my_cohort, pheno_id = 8) %>% head(n=10) %>% kable()
For phenotype "Total full brothers" (with id = 48).
# cb_get_phenotype_metadata(48)$name # "Total full brothers" cb_get_phenotype_statistics(cohort = my_cohort, pheno_id = 48) %>% kable()
Now let's restrict our cohort to a set of participants based on the phenotypes we explored above.
Simple queries allow us to create a cohort query that combines a list of phenotype criteria according to a logical AND. Let's define a simple query using the following named list-based format (note the phenotype id is quoted):
# year of birth: 1965 - 1995 ; AND total full brothers: 1 or 2 simple_query = list("8" = list("from" = 1965, "to" = 1995), "48" = c(1, 2))
Let's check how many participants would be in the cohort if we applied this query, but without actually applying it.
cb_participant_count(cohort = my_cohort, simple_query = simple_query, keep_query = F)
If we're happy that this is a sensible query to apply, we can apply the query to the cohort, making sure to override the previous query by setting keep_query
to FALSE
. If we wanted to keep the criteria from the pre-exisitng query and add our new phenotype-based criteria to them we would leave keep_query
set to the defualt value of TRUE
.
# apply the query cb_apply_query(cohort = my_cohort, simple_query = simple_query, keep_query = F) # update the local cohort object with info from the changed version on the server my_cohort <- cb_load_cohort(my_cohort@id) # double check that the cohort has th number of participants we expected cb_participant_count(cohort = my_cohort)
We could now further restrict our cohort to include only females (phenotype "Participant phenotypic sex", id = 10) by using keep_query = TRUE
. In other words, this argument applies a query that looks like "old query AND new query".
# apply the query cb_apply_query(cohort = my_cohort, simple_query = list("10" = "Female"), keep_query = T) # update the local cohort object with info from the changed version on the server my_cohort <- cb_load_cohort(my_cohort@id) # check the number of participants cb_participant_count(my_cohort)
We could adjust our previous filter to restrict participants to those who are born from 1965 to 1995 OR have 1 or 2 full brothers as well as being female (( Total full brothers = 1, 2 OR 1965 < Year of birth < 1995 ) AND Participant phenotypic sex = Female
). We could achieve this with an advanced query which uses a more complicated (but more flexible) nested list format (note the phenotype id is not quoted):
adv_query <- list( "operator" = "AND", "queries" = list( list("id" = 10, "value" = "Female"), list( "operator" = "OR", "queries" = list( list("id" = 8, "value" = list("from" = 1965, "to" = 1995)), list("id" = 48, "value" = c(1, 2)) ) ) ) )
Available operators in advanced queries: "AND"
, "OR"
, "NOT"
.
Lets apply this query to our cohort and inspect the distribution of our phenotype of interest in the cohort.
# apply the query cb_apply_query(cohort = my_cohort, adv_query = adv_query, keep_query = F) # update the local cohort object with info from the changed version on the server my_cohort <- cb_load_cohort(my_cohort@id) # view the distribution of disease groups in our cohort cb_get_phenotype_statistics(cohort = my_cohort, pheno_id = 206) %>% head(n=10) %>% kable()
Now lets get a participant phenotype table with the columns of interest for our cohort.
First we have to update the cohort on the cohort browser server to set what columns will be in the table. Currently the best way to do this is to use (counterintuitively) cb_apply_query
to add the IDs of the phenotypes of interest as columns.
cb_apply_query(my_cohort, column_ids = c(208, 10, 8, 48), keep_columns = T) my_cohort <- cb_load_cohort(my_cohort@id)
Now we can fetch the participant phenotype table which includes these columns.
pheno_df <- cb_get_participants_table(cohort = my_cohort, page_size = cb_participant_count(my_cohort)$count) pheno_df %>% head(n=10) %>% kable()
Get the genotypic table for a cohort (currently only cohort browser version 1 is supported).
cohort_genotype <- cb_get_genotypic_table(cohort = my_cohort) cohort_genotype %>% head(n=2) %>% kable()
This package is under active development. If you find any issues, please reach out here - https://github.com/lifebit-ai/cloudos/issues
For documentation visit - https://lifebit-ai.github.io/cloudos/
MIT © Lifebit
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.