BiocStyle::markdown()
suppressPackageStartupMessages(library("tidyverse"))
The depmap
package aims to provide a reproducible research framework
to cancer dependency data described by Tsherniak, Aviad, et
al. "Defining a cancer dependency map." Cell 170.3 (2017):
564-576.. The data
found in the
depmap
package has been formatted to facilitate the use of common R packages
such as dplyr
and ggplot2
. We hope that this package will allow
researchers to more easily mine, explore and visually illustrate
dependency data taken from the Depmap cancer genomic dependency study.
To install depmap, the BiocManager Bioconductor Project Package Manager is required. If BiocManager is not already installed, it will need to be done so beforehand. Type (within R) install.packages("BiocManager") (This needs to be done just once.)
install.packages("BiocManager") BiocManager::install("depmap")
The depmap
package fully depends on the ExperimentHub
Bioconductor
package, which allows the data accessed in this package to be stored
and retrieved from the cloud.
library("depmap") library("ExperimentHub")
The
depmap
package currently contains eight datasets available through
ExperimentHub
.
The data found in this R package has been converted from a "wide"
format .csv
file to "long" format .rda file. None of the values
taken from the original datasets have been changed, although the
columns have been re-arranged. Descriptions of the changes made are
described under the Details
section after querying the relevant
dataset.
## create ExperimentHub query object eh <- ExperimentHub() query(eh, "depmap")
Each dataset has a ExperimentHub
accession number, (e.g. EH2260
refers to the rnai
dataset from the 19Q1 release).
The rnai
dataset contains the combined genetic dependency data for
RNAi - induced gene knockdown for select genes and cancer cell
lines. This data corresponds to the
D2_combined_genetic_dependency_scores.csv
file.
Specific rnai
datasets can be accessed, such as rnai_19Q1
by EH number.
eh[["EH2260"]]
The most recent rnai
dataset can be automatically loaded into R by
using the depmap_rnai
function.
depmap::depmap_rnai()
The crispr
dataset contains the (batch corrected CERES inferred gene
effect) CRISPR-Cas9 knockout data of select genes and cancer cell
lines. This data corresponds to the gene_effect_corrected.csv
file.
Specific crispr
datasets can be accessed, such as crispr_19Q1
by
EH number.
eh[["EH2261"]]
The most recent crispr
dataset can be automatically loaded into R by
using the depmap_crispr
function.
depmap::depmap_crispr()
The copyNumber
dataset contains the WES copy number data, relating
to the numerical log-fold copy number change measured against the
baseline copy number of select genes and cell lines. This dataset
corresponds to the public_19Q1_gene_cn.csv
Specific copyNumber
datasets can be accessed, such as
copyNumber_19Q1
by EH number.
eh[["EH2262"]]
The most recent copyNumber
dataset can be automatically loaded into
R by using the depmap_copyNumber
function.
depmap::depmap_copyNumber()
The RPPA
dataset contains the CCLE Reverse Phase Protein Array
(RPPA) data which corresponds to the CCLE_RPPA_20180123.csv
file.
Specific RPPA
datasets can be accessed, such as RPPA_19Q1
by EH
number.
eh[["EH2263"]]
The most recent RPPA
dataset can be automatically loaded into R by
using the depmap_RPPA
function.
depmap::depmap_RPPA()
The TPM
dataset contains the CCLE RNAseq gene expression data. This
shows expression data only for protein coding genes (using scale
log2(TPM+1)). This data corresponds to the CCLE_depMap_19Q1_TPM.csv
file.
Specific TPM
datasets can be accessed, such as TPM_19Q1
by EH number.
eh[["EH2264"]]
The TPM
dataset can also be accessed by using the depmap_TPM
function.
depmap::depmap_TPM()
The metadata
dataset contains the metadata about all of the cancer
cell lines. It corresponds to the depmap_19Q1_cell_lines.csv
file.
Specific metadata
datasets can be accessed, such as metadata_19Q1
by EH number.
eh[["EH2266"]]
The most recent metadata
dataset can be automatically loaded into R by using
the depmap_metadata
function.
depmap::depmap_metadata()
The mutationCalls
dataset contains all merged mutation calls (coding
region, germline filtered) found in the depmap dependency study. This
dataset corresponds with the depmap_19Q1_mutation_calls.csv
file.
Specific mutationCalls
datasets can be accessed, such as
mutationCalls_19Q1
by EH number.
eh[["EH2265"]]
The most recent mutationCalls
dataset can be automatically loaded into R by
using the depmap_mutationCalls
function.
depmap::depmap_mutationCalls()
The drug_sensitivity
dataset contains dependency data for cancer
cell lines treated with various compounds. This dataset corresponds
with the primary_replicate_collapsed_logfold_change.csv
file.
Specific drug_sensitivity
datasets can be accessed, such as
drug_sensitivity_19Q3
by EH number.
eh[["EH3087"]]
The most recent drug_sensitivity
dataset can be automatically loaded
into R by using the depmap_drug_sensitivity
function.
depmap::depmap_drug_sensitivity()
The proteomic
dataset contains normalized quantitative profiling of
proteins of cancer cell lines by mass spectrometry. This dataset
corresponds with the
https://gygi.med.harvard.edu/sites/gygi.med.harvard.edu/files/documents/protein_quant_current_normalized.csv.gz
file.
Specific proteomic
datasets can be accessed, such as
proteomic_20Q2
by EH number.
eh[["EH3459"]]
The most recent proteomic
dataset can be automatically loaded into R by
using the depmap_proteomic
function.
depmap::depmap_proteomic()
If desired, the original data from which the
depmap package were
derived from can be downloaded from the Broad
Institute website. The
instructions on how to download these files and how the data was
transformed and loaded into the
depmap package can be
found in the make_data.R
file found in ./inst/scripts
. (It should
be noted that the original uncompressed .csv files are > 1.5GB in
total and take a moderate amount of time to download remotely.)
In addition to the re-packaged files, the package also allows to download any of the original files provided by the DepMap project on Figshare.
A list of all the datasets is available with the dmsets()
function:
dmsets()
We could check what datasets from any quarter of 2020 are available by
searching for "20Q"
in the datasets titles:
library(tidyverse) dmsets() |> filter(grepl("20Q", title))
Let's focus on the PRISM Repurposing 20Q2 Dataset dataset, with
identifier 20564034
.
A list of all the files is available with the dmfiles()
function:
dmfiles()
If we want to find all files from the PRISM Repurposing 20Q2 Dataset
identified above, we could filter all files with its dataset_id
:
dmfiles() |> filter(dataset_id == 20564034)
Let's now focus on the
prism-repurposing-20q2-primary-screen-cell-line-info.csv
file. We
can filter it by its name and downloaded it with dmget()
:
dmfiles() |> filter(name == "prism-repurposing-20q2-primary-screen-cell-line-info.csv") |> dmget()
The dmget()
function will first check if it hasn't already been
downloaded and cached in the depmap cache directory (see
?dmCache()
). If so, it will retrieve if from there. Otherwise, it
will download the file and store it in the package cache directory. It
will return the location of the cached file.
Given that the file is in csv format, we can directly open it with
read_csv()
:
dmfiles() |> filter(name == "prism-repurposing-20q2-primary-screen-cell-line-info.csv") |> dmget() |> read_csv()
It is also possible to pass multiple rows of the dmfiles()
table to
dmget()
to retrieve multiple file paths. Below, let's get all the
README.txt files from 2020:
ids_2020 <- filter(dmsets(), grepl("20Q", title)) |> pull(dataset_id) dmfiles() |> filter(dataset_id %in% ids_2020) |> filter(grepl("README", name)) |> dmget()
```r sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.