BiocStyle::markdown()
suppressPackageStartupMessages(library("tidyverse"))

Introduction

The depmap package aims to provide a reproducible research framework to cancer dependency data described by Tsherniak, Aviad, et al. "Defining a cancer dependency map." Cell 170.3 (2017): 564-576.. The data found in the depmap package has been formatted to facilitate the use of common R packages such as dplyr and ggplot2. We hope that this package will allow researchers to more easily mine, explore and visually illustrate dependency data taken from the Depmap cancer genomic dependency study.

Installation instructions

To install depmap, the BiocManager Bioconductor Project Package Manager is required. If BiocManager is not already installed, it will need to be done so beforehand. Type (within R) install.packages("BiocManager") (This needs to be done just once.)

install.packages("BiocManager")
BiocManager::install("depmap")

The depmap package fully depends on the ExperimentHub Bioconductor package, which allows the data accessed in this package to be stored and retrieved from the cloud.

library("depmap")
library("ExperimentHub")

Tidy depmap data

The depmap package currently contains eight datasets available through ExperimentHub.

The data found in this R package has been converted from a "wide" format .csv file to "long" format .rda file. None of the values taken from the original datasets have been changed, although the columns have been re-arranged. Descriptions of the changes made are described under the Details section after querying the relevant dataset.

## create ExperimentHub query object
eh <- ExperimentHub()
query(eh, "depmap")

Each dataset has a ExperimentHub accession number, (e.g. EH2260 refers to the rnai dataset from the 19Q1 release).

RNA inference knockout data

The rnai dataset contains the combined genetic dependency data for RNAi - induced gene knockdown for select genes and cancer cell lines. This data corresponds to the D2_combined_genetic_dependency_scores.csv file.

Specific rnai datasets can be accessed, such as rnai_19Q1 by EH number.

eh[["EH2260"]]

The most recent rnai dataset can be automatically loaded into R by using the depmap_rnai function.

depmap::depmap_rnai()

CRISPR-Cas9 knockout data

The crispr dataset contains the (batch corrected CERES inferred gene effect) CRISPR-Cas9 knockout data of select genes and cancer cell lines. This data corresponds to the gene_effect_corrected.csv file.

Specific crispr datasets can be accessed, such as crispr_19Q1 by EH number.

eh[["EH2261"]]

The most recent crispr dataset can be automatically loaded into R by using the depmap_crispr function.

depmap::depmap_crispr()

WES copy number data

The copyNumber dataset contains the WES copy number data, relating to the numerical log-fold copy number change measured against the baseline copy number of select genes and cell lines. This dataset corresponds to the public_19Q1_gene_cn.csv

Specific copyNumber datasets can be accessed, such as copyNumber_19Q1 by EH number.

eh[["EH2262"]]

The most recent copyNumber dataset can be automatically loaded into R by using the depmap_copyNumber function.

depmap::depmap_copyNumber()

CCLE Reverse Phase Protein Array data

The RPPA dataset contains the CCLE Reverse Phase Protein Array (RPPA) data which corresponds to the CCLE_RPPA_20180123.csv file.

Specific RPPA datasets can be accessed, such as RPPA_19Q1 by EH number.

eh[["EH2263"]]

The most recent RPPA dataset can be automatically loaded into R by using the depmap_RPPA function.

depmap::depmap_RPPA()

CCLE RNAseq gene expression data

The TPM dataset contains the CCLE RNAseq gene expression data. This shows expression data only for protein coding genes (using scale log2(TPM+1)). This data corresponds to the CCLE_depMap_19Q1_TPM.csv file.

Specific TPM datasets can be accessed, such as TPM_19Q1 by EH number.

eh[["EH2264"]]

The TPM dataset can also be accessed by using the depmap_TPM function.

depmap::depmap_TPM()

Cancer cell lines

The metadata dataset contains the metadata about all of the cancer cell lines. It corresponds to the depmap_19Q1_cell_lines.csv file.

Specific metadata datasets can be accessed, such as metadata_19Q1 by EH number.

eh[["EH2266"]]

The most recent metadata dataset can be automatically loaded into R by using the depmap_metadata function.

depmap::depmap_metadata()

Mutation calls

The mutationCalls dataset contains all merged mutation calls (coding region, germline filtered) found in the depmap dependency study. This dataset corresponds with the depmap_19Q1_mutation_calls.csv file.

Specific mutationCalls datasets can be accessed, such as mutationCalls_19Q1 by EH number.

eh[["EH2265"]]

The most recent mutationCalls dataset can be automatically loaded into R by using the depmap_mutationCalls function.

depmap::depmap_mutationCalls()

Drug Sensitivity

The drug_sensitivity dataset contains dependency data for cancer cell lines treated with various compounds. This dataset corresponds with the primary_replicate_collapsed_logfold_change.csv file.

Specific drug_sensitivity datasets can be accessed, such as drug_sensitivity_19Q3 by EH number.

 eh[["EH3087"]]

The most recent drug_sensitivity dataset can be automatically loaded into R by using the depmap_drug_sensitivity function.

depmap::depmap_drug_sensitivity()

Proteomic

The proteomic dataset contains normalized quantitative profiling of proteins of cancer cell lines by mass spectrometry. This dataset corresponds with the https://gygi.med.harvard.edu/sites/gygi.med.harvard.edu/files/documents/protein_quant_current_normalized.csv.gz file.

Specific proteomic datasets can be accessed, such as proteomic_20Q2 by EH number.

eh[["EH3459"]]

The most recent proteomic dataset can be automatically loaded into R by using the depmap_proteomic function.

depmap::depmap_proteomic()

Repackaged data source

If desired, the original data from which the depmap package were derived from can be downloaded from the Broad Institute website. The instructions on how to download these files and how the data was transformed and loaded into the depmap package can be found in the make_data.R file found in ./inst/scripts. (It should be noted that the original uncompressed .csv files are > 1.5GB in total and take a moderate amount of time to download remotely.)

Original depmap data

In addition to the re-packaged files, the package also allows to download any of the original files provided by the DepMap project on Figshare.

A list of all the datasets is available with the dmsets() function:

dmsets()

We could check what datasets from any quarter of 2020 are available by searching for "20Q" in the datasets titles:

library(tidyverse)
dmsets() |>
    filter(grepl("20Q", title))

Let's focus on the PRISM Repurposing 20Q2 Dataset dataset, with identifier 20564034.

A list of all the files is available with the dmfiles() function:

dmfiles()

If we want to find all files from the PRISM Repurposing 20Q2 Dataset identified above, we could filter all files with its dataset_id:

dmfiles() |>
    filter(dataset_id == 20564034)

Let's now focus on the prism-repurposing-20q2-primary-screen-cell-line-info.csv file. We can filter it by its name and downloaded it with dmget():

dmfiles() |>
    filter(name == "prism-repurposing-20q2-primary-screen-cell-line-info.csv") |>
    dmget()

The dmget() function will first check if it hasn't already been downloaded and cached in the depmap cache directory (see ?dmCache()). If so, it will retrieve if from there. Otherwise, it will download the file and store it in the package cache directory. It will return the location of the cached file.

Given that the file is in csv format, we can directly open it with read_csv():

dmfiles() |>
    filter(name == "prism-repurposing-20q2-primary-screen-cell-line-info.csv") |>
    dmget() |>
    read_csv()

It is also possible to pass multiple rows of the dmfiles() table to dmget() to retrieve multiple file paths. Below, let's get all the README.txt files from 2020:

ids_2020 <- filter(dmsets(), grepl("20Q", title)) |>
    pull(dataset_id)

dmfiles() |>
    filter(dataset_id %in% ids_2020) |>
    filter(grepl("README", name)) |>
    dmget()

Session information

```r sessionInfo()



UCLouvain-CBIO/depmap documentation built on Aug. 18, 2024, 9:46 p.m.