knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)

cluequery

The goal of cluequery is to provide an easy R interface to query Connectivity Map data from Clue.

Installation

You can install the current version of cluequery from Github with:

if (!requireNamespace("remotes", quietly = TRUE))
  install.packages("remotes")
remotes::install_github("labsyspharm/cluequery")

API key

In order to use this package an API key from Clue is required. For academice purposes, they are freely available at Clue.

The API key can be automatically retrieved by all cluequery functions if it is added to the ~/.Renviron file:

CLUE_API_KEY=xxx

Example

This is a basic example how to query Clue for a gene signature of interest. Gene signatures can come from any source, but must contain a set of upregulated genes and a set of downregulated genes.

The gene signatures must be converted into GMT format for use with Clue.

Gene signature derived from differential expression with DESeq2

In this example, we generate a gene set compatible with Clue from a DESeq2 differential expression experiment.

First, we load our example DESeq2 result:

library(cluequery)
deseq2_res_path <- system.file("extdata", "example_deseq2_result.csv.xz", package = "cluequery")
deseq2_res <- read.csv(deseq2_res_path)
head(deseq2_res)

Note that gene IDs need to be Entrez IDs.

We need to choose a name for the gene set and decide on a significance cutoff (alpha) for our differentially expressed genes.

deseq2_gmt <- clue_gmt_from_deseq2(deseq2_res, name = "treatment_drug_x", alpha = 0.05)
str(deseq2_gmt)

A number of warnings are raised, indicating that some gene IDs are not part of the BING gene space. These genes are removed from the gene sets and not considered by Clue.

clue_prepare_deseq2 returns a named vector with paths to the GMT files for the up- and the down-regulated genes.

Gene signature derived from pre-existing lists

We can also convert an pre-existing set of genes from any source to GMT files:

up_genes <- c(
  "10365", "1831", "9314", "4846", "678", "22992", "3397", "26136", 
  "79637", "5551", "7056", "79888", "1032", "51278", "64866", "29775", 
  "994", "51696", "81839", "23580", "219654", "57178", "7014", 
  "57513", "51599", "55818", "4005", "4130", "4851", "2050", "50650", 
  "9469", "54438", "3628", "54922", "3691", "65981", "54820", "2261", 
  "2591", "7133", "162427", "10912", "8581", "2523", "25807", "9922", 
  "30850", "4862", "8567", "79686", "55615", "51283", "3337", "2887", 
  "3223", "6915", "6907", "26056", "259217", "6574", "23097", "5164", 
  "57493", "7071", "5450", "113146", "8650"
)
down_genes <- c(
  "5128", "5046", "956", "10426", "9188", "23403", "7204", "1827", 
  "3491", "9076", "330", "8540", "22800", "10687", "19", "63875", 
  "10979", "51154", "10370", "50628", "7128", "6617", "7187", "22916", 
  "81034", "58516", "3096", "4794", "5202", "26511", "8767", "2355", 
  "22943", "1490", "133", "11010", "51025", "23160", "56902", "3981", 
  "5209", "6347", "5806", "7357", "9425", "3399", "6446", "64328", 
  "6722", "8545", "688", "861", "390", "23034", "51330", "51474", 
  "2633", "4609"
)

pre_gmt <- clue_gmt_from_list(up_genes, down_genes, "my_favourite_genes")
str(pre_gmt)

Querying Clue

Now that we have GMT files, we can query Clue. Here we use the GMT files derived from the DESeq2 result, but we could also use the pre_gmt files generated above.

submission_result <- clue_query_submit(
  deseq2_gmt[["up"]], deseq2_gmt[["down"]], name = "deseq2_query_job", use_fast_tool = FALSE
)
submission_result$result$job_id

clue_query_submit returns a nested list containing information about the submitted job, including the job id.

Now we have to wait for the job to finish. We can use the function clue_query_wait to pause our script until the results are ready.

clue_query_wait(submission_result, interval = 60, timeout = 600)

clue_query_wait will automatically continue execution of the script once results are ready. Now we can download and parse the results:

result_path <- clue_query_download(submission_result)
result_df <- clue_parse_result(result_path)
knitr::kable(head(result_df, n = 10))

Funding

We gratefully acknowledge support by NIH Grant 1U54CA225088-01: Systems Pharmacology of Therapeutic and Adverse Responses to Immune Checkpoint and Small Molecule Drugs.



labsyspharm/cluequery documentation built on May 23, 2022, 6:25 a.m.