BiocStyle::markdown()
suppressPackageStartupMessages(library("tidyverse"))
The depmap package aims to provide a reproducible research framework
to cancer dependency data described by Tsherniak, Aviad, et
al. "Defining a cancer dependency map." Cell 170.3 (2017):
564-576.. The data
found in the
depmap
package has been formatted to facilitate the use of common R packages
such as dplyr and ggplot2. We hope that this package will allow
researchers to more easily mine, explore and visually illustrate
dependency data taken from the Depmap cancer genomic dependency study.
To install depmap, the BiocManager Bioconductor Project Package Manager is required. If BiocManager is not already installed, it will need to be done so beforehand. Type (within R) install.packages("BiocManager") (This needs to be done just once.)
install.packages("BiocManager") BiocManager::install("depmap")
The depmap package fully depends on the ExperimentHub Bioconductor
package, which allows the data accessed in this package to be stored
and retrieved from the cloud.
library("depmap") library("ExperimentHub")
The
depmap
package currently contains eight datasets available through
ExperimentHub.
The data found in this R package has been converted from a "wide"
format .csv file to "long" format .rda file. None of the values
taken from the original datasets have been changed, although the
columns have been re-arranged. Descriptions of the changes made are
described under the Details section after querying the relevant
dataset.
## create ExperimentHub query object eh <- ExperimentHub() query(eh, "depmap")
Each dataset has a ExperimentHub accession number, (e.g. EH2260
refers to the rnai dataset from the 19Q1 release).
The rnai dataset contains the combined genetic dependency data for
RNAi - induced gene knockdown for select genes and cancer cell
lines. This data corresponds to the
D2_combined_genetic_dependency_scores.csv file.
Specific rnai datasets can be accessed, such as rnai_19Q1 by EH number.
eh[["EH2260"]]
The most recent rnai dataset can be automatically loaded into R by
using the depmap_rnai function.
depmap::depmap_rnai()
The crispr dataset contains the (batch corrected CERES inferred gene
effect) CRISPR-Cas9 knockout data of select genes and cancer cell
lines. This data corresponds to the gene_effect_corrected.csv file.
Specific crispr datasets can be accessed, such as crispr_19Q1 by
EH number.
eh[["EH2261"]]
The most recent crispr dataset can be automatically loaded into R by
using the depmap_crispr function.
depmap::depmap_crispr()
The copyNumber dataset contains the WES copy number data, relating
to the numerical log-fold copy number change measured against the
baseline copy number of select genes and cell lines. This dataset
corresponds to the public_19Q1_gene_cn.csv
Specific copyNumber datasets can be accessed, such as
copyNumber_19Q1 by EH number.
eh[["EH2262"]]
The most recent copyNumber dataset can be automatically loaded into
R by using the depmap_copyNumber function.
depmap::depmap_copyNumber()
The RPPA dataset contains the CCLE Reverse Phase Protein Array
(RPPA) data which corresponds to the CCLE_RPPA_20180123.csv file.
Specific RPPA datasets can be accessed, such as RPPA_19Q1 by EH
number.
eh[["EH2263"]]
The most recent RPPA dataset can be automatically loaded into R by
using the depmap_RPPA function.
depmap::depmap_RPPA()
The TPM dataset contains the CCLE RNAseq gene expression data. This
shows expression data only for protein coding genes (using scale
log2(TPM+1)). This data corresponds to the CCLE_depMap_19Q1_TPM.csv
file.
Specific TPM datasets can be accessed, such as TPM_19Q1 by EH number.
eh[["EH2264"]]
The TPM dataset can also be accessed by using the depmap_TPM function.
depmap::depmap_TPM()
The metadata dataset contains the metadata about all of the cancer
cell lines. It corresponds to the depmap_19Q1_cell_lines.csv file.
Specific metadata datasets can be accessed, such as metadata_19Q1
by EH number.
eh[["EH2266"]]
The most recent metadata dataset can be automatically loaded into R by using
the depmap_metadata function.
depmap::depmap_metadata()
The mutationCalls dataset contains all merged mutation calls (coding
region, germline filtered) found in the depmap dependency study. This
dataset corresponds with the depmap_19Q1_mutation_calls.csv file.
Specific mutationCalls datasets can be accessed, such as
mutationCalls_19Q1 by EH number.
eh[["EH2265"]]
The most recent mutationCalls dataset can be automatically loaded into R by
using the depmap_mutationCalls function.
depmap::depmap_mutationCalls()
The drug_sensitivity dataset contains dependency data for cancer
cell lines treated with various compounds. This dataset corresponds
with the primary_replicate_collapsed_logfold_change.csv file.
Specific drug_sensitivity datasets can be accessed, such as
drug_sensitivity_19Q3 by EH number.
eh[["EH3087"]]
The most recent drug_sensitivity dataset can be automatically loaded
into R by using the depmap_drug_sensitivity function.
depmap::depmap_drug_sensitivity()
The proteomic dataset contains normalized quantitative profiling of
proteins of cancer cell lines by mass spectrometry. This dataset
corresponds with the
https://gygi.med.harvard.edu/sites/gygi.med.harvard.edu/files/documents/protein_quant_current_normalized.csv.gz
file.
Specific proteomic datasets can be accessed, such as
proteomic_20Q2 by EH number.
eh[["EH3459"]]
The most recent proteomic dataset can be automatically loaded into R by
using the depmap_proteomic function.
depmap::depmap_proteomic()
If desired, the original data from which the
depmap package were
derived from can be downloaded from the Broad
Institute website. The
instructions on how to download these files and how the data was
transformed and loaded into the
depmap package can be
found in the make_data.R file found in ./inst/scripts. (It should
be noted that the original uncompressed .csv files are > 1.5GB in
total and take a moderate amount of time to download remotely.)
In addition to the re-packaged files, the package also allows to download any of the original files provided by the DepMap project on Figshare.
A list of all the datasets is available with the dmsets() function:
dmsets()
We could check what datasets from any quarter of 2020 are available by
searching for "20Q" in the datasets titles:
library(tidyverse) dmsets() |> filter(grepl("20Q", title))
Let's focus on the PRISM Repurposing 20Q2 Dataset dataset, with
identifier 20564034.
A list of all the files is available with the dmfiles() function:
dmfiles()
If we want to find all files from the PRISM Repurposing 20Q2 Dataset
identified above, we could filter all files with its dataset_id:
dmfiles() |> filter(dataset_id == 20564034)
Let's now focus on the
prism-repurposing-20q2-primary-screen-cell-line-info.csv file. We
can filter it by its name and downloaded it with dmget():
dmfiles() |> filter(name == "prism-repurposing-20q2-primary-screen-cell-line-info.csv") |> dmget()
The dmget() function will first check if it hasn't already been
downloaded and cached in the depmap cache directory (see
?dmCache()). If so, it will retrieve if from there. Otherwise, it
will download the file and store it in the package cache directory. It
will return the location of the cached file.
Given that the file is in csv format, we can directly open it with
read_csv():
dmfiles() |> filter(name == "prism-repurposing-20q2-primary-screen-cell-line-info.csv") |> dmget() |> read_csv()
It is also possible to pass multiple rows of the dmfiles() table to
dmget() to retrieve multiple file paths. Below, let's get all the
README.txt files from 2020:
ids_2020 <- filter(dmsets(), grepl("20Q", title)) |> pull(dataset_id) dmfiles() |> filter(dataset_id %in% ids_2020) |> filter(grepl("README", name)) |> dmget()
```r sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.