cloudos

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)

Lifecycle: stable R build status

cloudos R package makes it easy to interact with Lifebit's CloudOS platform in an R environment.

Installation

You can install the latest release of cloudos from:

install.packages("cloudos")

```{shell, eval=FALSE} conda install -c conda-forge r-cloudos

+ [GitHub](https://github.com/lifebit-ai/cloudos/):
```r
if (!require(remotes)) { install.packages("remotes") }
  remotes::install_github("lifebit-ai/cloudos")

Alternatively, you can install the latest development version of cloudos: ```{shell, eval=FALSE} git clone https://github.com/lifebit-ai/cloudos cd cloudos git checkout origin/devel Rscript -e 'devtools::install(".")'

## Usage

Below is a demonstration of how the **cloudos** package can be used.

### Load the library

```r
library(cloudos)
library(knitr) # For better visualization of wide dataframes in this README examples
library(magrittr) # For pipe

Configure CloudOS

This package is primarily a means of communicating with a CloudOS instance using it's API. Before it can communicate with the CloudOS instance, the package must be configured with some key information: - The CloudOS base URL. This is the URL in your browser when you navigate to the Cohort Browser in CloudOS. Often of the form https://my_instance.lifebit.ai/app/cohort-browser. - The CloudOS token. Navigate to settings page in CloudOS to generate an API key you can use as your token (see image below). - The CloudOS team ID. Also found in the settings page in CloudOS labelled as the "Workspace ID" (see image below).

CloudOS settings page

The package will look for this information in the following locations in this order:

  1. From environment variables CLOUDOS_BASEURL, CLOUDOS_TOKEN, and CLOUDOS_TEAMID.
  2. From a cloudos configuration file.

There are three ways to configure the package:

  1. Add them to ~/.Renviron in the following way, which will load the environment variables on beginning of the R-session
CLOUDOS_BASEURL="xxx"
CLOUDOS_TOKEN="xxx"
CLOUDOS_TEAMID="xxx"
  1. Add them during an R session using Sys.setenv(ENV_VAR = "env_var_value")
Sys.setenv(CLOUDOS_BASEURL = "xxx")
Sys.setenv(CLOUDOS_TOKEN = "xxx")
Sys.setenv(CLOUDOS_TEAMID = "xxx")
  1. Use the function cloudos_configure(), which will create a ~/.cloudos/config that will persist between R sessions and be read from each time (Recommended way if you are using multiple cloudos clients).
cloudos_configure(base_url = "xxx",
                  token = "xxx",
                  team_id = "xxx")

Application - Cohort Browser

Below information is out of date, please refer to the latest function docs.

Cohort Browser is part of Lifebit's CloudOS offering. Let's explore how to interact with this in R environment.

List Cohorts

To check list of available cohorts in a workspace.

cohorts <- cb_list_cohorts()
cohorts %>% head(n=5) %>% kable()

Create a cohort

To create a new cohort.

my_cohort <- cb_create_cohort(cohort_name = "Cohort-R",
                             cohort_desc = "This cohort is for testing purpose, created from R.")
my_cohort

Get a cohort

Get a available cohort in to a cohort R object. This cohort object can be used in many different other functions.

other_cohort <- cb_load_cohort(cohort_id = "610ac00edb7c7a1d9d0c309f")
other_cohort

Explore available phenotypes

Search phenotypes

Search for phenotypes based on a term. Searching with term = "" will return all the available phenotypes.

disease_phenotypes <- cb_search_phenotypes(term = "disease")
disease_phenotypes %>% head(n=5) %>% kable()

Let's choose a phenotype from the above table. The "id" is the most important part as it will allow us to use this phenotype for cohort queries and other functions.

# get the first row/phenotype in the table
my_phenotype <- disease_phenotypes[5,]
my_phenotype %>% kable()

Get distribution of cohort participants for a phenotype

Let's check the numbers of participants across the categories of this phenotype.

# phenotype
my_pheno_data <- cb_get_phenotype_statistics(cohort = my_cohort, 
                                     pheno_id = my_phenotype$id)
my_pheno_data %>% head(n=10) %>% kable()

Update a cohort with a new query

A query defines what particpants are included in a cohort based on phenotypes.

Phenotypes can be continuous - in which case a selected range needs to be specified, or they can be categorical - in which case selected categories need to be specified.

Continuous phenotype

For phenotype "Year of birth" (with id = 8)

# cb_get_phenotype_metadata(8)$name
# "Year of birth"
cb_get_phenotype_statistics(cohort = my_cohort, pheno_id = 8) %>% head(n=10) %>% kable()

Categorical phenotype

For phenotype "Total full brothers" (with id = 48).

# cb_get_phenotype_metadata(48)$name
# "Total full brothers"
cb_get_phenotype_statistics(cohort = my_cohort, pheno_id = 48) %>% kable()

Simple query

Now let's restrict our cohort to a set of participants based on the phenotypes we explored above.

Simple queries allow us to create a cohort query that combines a list of phenotype criteria according to a logical AND. Let's define a simple query using the following named list-based format (note the phenotype id is quoted):

# year of birth: 1965 - 1995 ; AND total full brothers: 1 or 2
simple_query = list("8" = list("from" = 1965, "to" = 1995),
                    "48" = c(1, 2))

Let's check how many participants would be in the cohort if we applied this query, but without actually applying it.

cb_participant_count(cohort = my_cohort, simple_query = simple_query, keep_query = F)

If we're happy that this is a sensible query to apply, we can apply the query to the cohort, making sure to override the previous query by setting keep_query to FALSE. If we wanted to keep the criteria from the pre-exisitng query and add our new phenotype-based criteria to them we would leave keep_query set to the defualt value of TRUE.

# apply the query
cb_apply_query(cohort = my_cohort, simple_query = simple_query, keep_query = F)

# update the local cohort object with info from the changed version on the server
my_cohort <- cb_load_cohort(my_cohort@id)

# double check that the cohort has th number of participants we expected
cb_participant_count(cohort = my_cohort)

We could now further restrict our cohort to include only females (phenotype "Participant phenotypic sex", id = 10) by using keep_query = TRUE. In other words, this argument applies a query that looks like "old query AND new query".

# apply the query
cb_apply_query(cohort = my_cohort, simple_query = list("10" = "Female"), keep_query = T)

# update the local cohort object with info from the changed version on the server
my_cohort <- cb_load_cohort(my_cohort@id)

# check the number of participants
cb_participant_count(my_cohort)

Advanced query

We could adjust our previous filter to restrict participants to those who are born from 1965 to 1995 OR have 1 or 2 full brothers as well as being female (( Total full brothers = 1, 2 OR 1965 < Year of birth < 1995 ) AND Participant phenotypic sex = Female). We could achieve this with an advanced query which uses a more complicated (but more flexible) nested list format (note the phenotype id is not quoted):

adv_query <- list(
  "operator" = "AND",
  "queries" = list(
    list("id" = 10, "value" = "Female"),
    list(
      "operator" = "OR",
      "queries" = list(
        list("id" = 8, "value" = list("from" = 1965, "to" = 1995)),
        list("id" = 48, "value" = c(1, 2))
      )
    )
  )
)

Available operators in advanced queries: "AND", "OR", "NOT".

Lets apply this query to our cohort and inspect the distribution of our phenotype of interest in the cohort.

# apply the query
cb_apply_query(cohort = my_cohort, adv_query = adv_query, keep_query = F)

# update the local cohort object with info from the changed version on the server
my_cohort <- cb_load_cohort(my_cohort@id)

# view the distribution of disease groups in our cohort
cb_get_phenotype_statistics(cohort = my_cohort, pheno_id = 206) %>% head(n=10) %>% kable()

Retreive the participant table

Now lets get a participant phenotype table with the columns of interest for our cohort.

First we have to update the cohort on the cohort browser server to set what columns will be in the table. Currently the best way to do this is to use (counterintuitively) cb_apply_query to add the IDs of the phenotypes of interest as columns.

cb_apply_query(my_cohort, column_ids = c(208, 10, 8, 48), keep_columns = T)
my_cohort <- cb_load_cohort(my_cohort@id)

Now we can fetch the participant phenotype table which includes these columns.

pheno_df <- cb_get_participants_table(cohort = my_cohort,
                                      page_size = cb_participant_count(my_cohort)$count)

pheno_df %>% head(n=10) %>% kable()

Get genotypic table

Get the genotypic table for a cohort (currently only cohort browser version 1 is supported).

cohort_genotype <- cb_get_genotypic_table(cohort = my_cohort)
cohort_genotype %>% head(n=2) %>% kable()

Additional notes

This package is under active development. If you find any issues, please reach out here - https://github.com/lifebit-ai/cloudos/issues

For documentation visit - https://lifebit-ai.github.io/cloudos/

License

MIT © Lifebit



lifebit-ai/cloudos documentation built on March 25, 2023, 2:47 a.m.