In ytakemon/GINIR: Genetic inteRaction and EssenTiality neTwork mApper

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>", 
  fig.path = "inst/figures/README-",
  out.width = "100%"
)

Introduction

Genetic inteRaction and EssenTiality mApper (GRETTA) is an R package that leverages data generated by the Cancer Dependency Map (DepMap) project to perform in-silico genetic knockout screens and map essentiality networks. A manuscript describing this tool is available at bioinformatics (Takemon, Y. and Marra, MA., 2023).

The DepMap data used in this tutorial is version 22Q2. This version along with all versions provided in this repository were downloaded through the DepMap data portal, which was distributed and used under the terms and conditions of CC Attribution 4.0 license.

Maintainer

This repository is maintained by Yuka Takemon, a PhD candidate in Dr. Marco Marra's laboratory at Canada's Michael Smith Genome Sciences Centre.

Citations

When using GRETTA, please cite the manuscript describing GRETTA: Yuka Takemon, Marco A Marra, GRETTA: an R package for mapping in silico genetic interaction and essentiality networks, Bioinformatics, Volume 39, Issue 6, June 2023, btad381, https://doi.org/10.1093/bioinformatics/btad381

Please also cite the DepMap project and the appropriate data version found on https://depmap.org/portal/: Tsherniak A, Vazquez F, Montgomery PG, Weir BA, Kryukov G, Cowley GS, Gill S, Harrington WF, Pantel S, Krill-Burger JM, Meyers RM, Ali L, Goodale A, Lee Y, Jiang G, Hsiao J, Gerath WFJ, Howell S, Merkel E, Ghandi M, Garraway LA, Root DE, Golub TR, Boehm JS, Hahn WC. Defining a Cancer Dependency Map. Cell. 2017 Jul 27;170(3):564-576.

Questions

Please check the FAQ section for additional information and if you cannot find your answer there or have a request please submit an issue.

Requirements

GRETTA is supported and compatible for R versions >= 4.2.0.
12G of space to store one DepMap data set with and an additional 11G of temporary space to for .tar.gz prior to extraction.

Installation

Warning The new version of dbplyr (v2.4.0) is currently incompatable with another library used in GRETTA. If you encounter an error message like the one below. Please install the previous working version also shown below.

Error message:

Error in `collect()`: ! Failed to collect lazy table. Caused by error in `db_collect()`: ! Arguments in `...` must be used. ✖ Problematic argument: • ..1 = Inf ℹ Did you misspell an argument name?

Solution:

install.packages("devtools") devtools::install_version("dbplyr", version = "2.3.4")`

You can install the GRETTA package from GitHub with:

install.packages(c("devtools", "dplyr","forcats","ggplot2"))
devtools::install_github("ytakemon/GRETTA")

DepMap 22Q2 data and the data documentation files are provided above and can be extracted directly in terminal using the following bash code (not in R/RStudio). For other DepMap data versions please refer to the FAQ section.

# Make a new directory/folder called GRETTA_project and go into directory
mkdir GRETTA_project
cd GRETTA_project

# Download data from the web
wget https://www.bcgsc.ca/downloads/ytakemon/GRETTA/22Q2/GRETTA_DepMap_22Q2_data.tar.gz

# Extract data and data documentation
tar -zxvf GRETTA_DepMap_22Q2_data.tar.gz

A singularity container has also been provided and instructions can be found here.

Additional DepMap versions

In this example we use DepMap's 2022 data release (22Q2). However, we also provide previous data released in 2020 (v20Q1) and 2021 (v21Q4), which are available at :https://www.bcgsc.ca/downloads/ytakemon/GRETTA/. We are hoping to make new data sets available as the are released by DepMap.

Workflows

Genetic interaction mapping

Install GRETTA and download accompanying data.
Select mutant cell lines that carry mutations in the gene of interest and control cell lines.
- (optional specifications) can be used to select cell lines based on disease type, disease subtype, or amino acid change.
Determine differential expression between mutant and control cell line groups.
- (optional but recommended).
Perform in silico genetic screen.
Visualize results.

Co-essential network mapping

Install GRETTA and download accompanying data.
Run correlation coefficient analysis.
- (optional specifications) can be used to perform analysis on cell lines of a specific disease type(s).
Calculate inflection points of negative/positive curve to determine a threshold.
Apply threshold.
Visualize results.

Example: Identifying ARID1A genetic interactions

ARID1A encodes a member of the chromatin remodeling SWItch/Sucrose Non-Fermentable (SWI/SNF) complex and is a frequently mutated gene in cancer. It is known that ARID1A and its homolog, ARID1B, are synthetic lethal to one another: The dual loss of ARID1A and its homolog, ARID1B, in a cell is lethal; however, the loss of either gene alone is not (Helming et al., 2014). This example will demonstrate how we can identify synthetic lethal interactors of ARID1A using GRETTA and predict this known interaction.

For this example you will need to call the following libraries. If you they are not installed yet use install.packages() (eg. install.packages("dplyr")).

# Load library
library(tidyverse)
library(GRETTA)

Download example data

A small data set has been created for this tutorial and can be downloaded using the following code.

path <- getwd()
download_example_data(path)

Then, assign variable that point to where the .rda files are stored and where result files should go.

gretta_data_dir <- paste0(path,"/GRETTA_example/")
gretta_output_dir <- paste0(path,"/GRETTA_example_output/")

Exploring cell lines

One way to explore cell lines that are available in DepMap is through their portal. However, there are some simple built-in methods in GRETTA to provide users with a way to glimpse the data using the series of list_available functions: list_mutations(), list_cancer_types(), list_cancer_subtypes()

Current DepMap data used by default is version 22Q2, which contains whole-genome sequencing or whole-exome sequencing annotations for 1771 cancer cell lines (1406 cell lines with RNA-seq data, 375 cell lines with quantitative proteomics data, and 1086 cell lines with CRISPR-Cas9 knockout screen data)

## Find ARID1A hotspot mutations detected in all cell lines
list_mutations(gene = "ARID1A", is_hotspot = TRUE, data_dir = gretta_data_dir)

## List all available cancer types
list_cancer_types(data_dir = gretta_data_dir)

## List all available cancer subtypes
list_cancer_subtypes(input_disease = "Lung Cancer", data_dir = gretta_data_dir)

Selecting mutant and control cell line groups

As default select_cell_lines() will identify cancer cell lines with loss-of-function alterations in the gene specified and group them into six different groups.

Loss-of-function alterations include variants that are annotated as: "Nonsense_Mutation", "Frame_Shift_Ins", "Splice_Site", "De_novo_Start_OutOfFrame", "Frame_Shift_Del", "Start_Codon_SNP", "Start_Codon_Del", and "Start_Codon_Ins". Copy number alterations are also taken into consideration and group as "Deep_del", "Loss", "Neutral", or "Amplified".

The cell line groups assigned by default are:

Control cell lines do not harbor any single nucleotide variations (SNVs) or insertions and deletions (InDels) with a neutral copy number (CN).
HomDel cell lines harbor one or more homozygous deleterious SNVs or have deep CN loss.
T-HetDel cell lines harbor two or more heterozygous deleterious SNVs/InDels with neutral or CN loss.
HetDel cell lines harbor one heterozygous deleterious SNV/InDel with neutral CN, or no SNV/InDel with CN loss.
Amplified cell lines harbor no SNVs/InDels with increased CN.
Others cell lines harbor deleterious SNVs with increased CN.

ARID1A_groups <- select_cell_lines(input_gene = "ARID1A", data_dir = gretta_data_dir)

## Show number of cell lines in each group 
count(ARID1A_groups, Group)

Optional cell line filters

There are several additional filters that can be combined together to narrow down your search. These

input_aa_change - by amino acid change (eg. "p.Q515*").
input_disease - by disease type (eg. "Pancreatic Cancer")
input_disease_subtype - by disease subtype (eg. "Ductal Adenosquamous Carcinoma")

## Find pancreatic cancer cell lines with ARID1A mutations
ARID1A_pancr_groups <- select_cell_lines(input_gene = "ARID1A", 
                                         input_disease = "Pancreatic Cancer",
                                         data_dir = gretta_data_dir)

## Show number of cell lines in each group 
count(ARID1A_pancr_groups, Group)

Check for differential expression

Of the three mutant cancer cell line groups ARID1A_HomDel, ARID1A_T-HetDel, and ARID1A_HetDel, cancer cell lines with ARID1A_HomDel mutations are most likely to result in a loss or reduced expression of ARID1A. Therefore, we want to check whether cell lines in ARID1A_HomDel mutant group have significantly less ARID1A RNA or protein expression compared to control cell lines.

## Select only HomDel and Control cell lines
ARID1A_groups_subset <- ARID1A_groups %>% filter(Group %in% c("ARID1A_HomDel", "Control"))

## Get RNA expression 
ARID1A_rna_expr <- extract_rna(
  input_samples = ARID1A_groups_subset$DepMap_ID, 
  input_genes = "ARID1A",
  data_dir = gretta_data_dir)

Not all cell lines contain RNA and/or protein expression profiles, and not all proteins were detected by mass spectrometer. (Details on data generation can be found on the DepMap site.)

## Get protein expression
ARID1A_prot_expr <- extract_prot(
  input_samples = ARID1A_groups_subset$DepMap_ID,
  input_genes = "ARID1A",
  data_dir = gretta_data_dir)

## Produces an error message since ARID1A protein data is not available

Using Welch's t-test, we can check to see whether ARID1A RNA expression (in TPM) is significantly reduced in ARID1A_HomDel cell lines compared to Controls.

## Append groups and test differential expression
ARID1A_rna_expr <- left_join(
  ARID1A_rna_expr,
  ARID1A_groups_subset %>% select(DepMap_ID, Group)) %>%
  mutate(Group = fct_relevel(Group,"Control")) # show Control group first

## T-test 
t.test(ARID1A_8289 ~ Group, ARID1A_rna_expr)

## plot 
ggplot(ARID1A_rna_expr, aes(x = Group, y = ARID1A_8289)) +
  geom_boxplot()

Perform genome-wide in silico genetic screen

After determining cell lines in the ARID1A_HomDel group has statistically significant reduction in RNA expression compared to Control cell lines, the next step is to perform a in silico genetic screen using screen_results(). This uses the dependency probabilities (or "lethality probabilities") generated from DepMap's genome-wide CRISPR-Cas9 knockout screen.

Lethality probabilities range from 0.0 to 1.0 and is quantified for each gene knock out in every cancer cell line screened (There are 18,334 genes targeted in 739 cancer cell lines). A gene knock out with a lethality probability of 0.0 indicates a non-essential for the cell line, and a gene knock out with a 1.0 indicates an essential gene (ie. very lethal). Details can be found in Meyers, R., et al., 2017

At its core, screen_results() performs multiple Mann-Whitney U tests, comparing lethality probabilities of each targeted gene between mutant and control groups. This generates a data frame with the following columns:

GeneName_ID - Hugo symbol with NCBI gene ID
GeneNames - Hugo symbol
_median, _mean, _sd, _iqr - Control and mutant group's median, mean, standard deviation (sd), and interquartile range (iqr) of dependency probabilities. Dependency probabilities range from zero to one, where one indicates a essential gene (ie. KO of gene was lethal) and zero indicates a non-essential gene (KO of gene was not lethal)
Pval - P-value from Mann Whitney U test between control and mutant groups.
Adj_pval - BH-adjusted P-value.
log2FC_by_median - Log2 normalized median fold change of dependency probabilities (mutant / control). Dependency probabilities range from 0.0-1.0 where 1.0 indicates high probability of KO leading to lethality, while 0.0 indicates little to no lethality.
log2FC_by_mean - Log2 normalized mean fold change of dependency probabilities (mutant / control). Dependency probabilities range from 0.0-1.0 where 1.0 indicates high probability of KO leading to lethality, while 0.0 indicates little to no lethality.
CliffDelta - Cliff's delta non-parametric effect size between mutant and control dependency probabilities. Ranges between -1 to 1.
dip_pval - Hartigan's dip test p-value. Tests whether distribution of mutant dependency probability is unimodel. If dip test is rejected (p-value < 0.05), this indicates that there is a multimodel dependency probability distribution and that there may be another factor contributing to this separation.
Interaction_score - Combined value generated from signed p-values: -log10(Pval) * sign(log2FC_by_median). Positive scores indicate possible lethal genetic interaction, and negative scores indicate possible alleviating genetic interaction.

Warning This process may take a few hours depending on the number of cores assigned. Our example below GI_screen() took ~2 hours to process. To save time, we have preprocessed this step for you.

ARID1A_mutant_id <- ARID1A_groups %>% filter(Group %in% c("ARID1A_HomDel")) %>% pull(DepMap_ID)
ARID1A_control_id <- ARID1A_groups %>% filter(Group %in% c("Control")) %>% pull(DepMap_ID)

## See warning above.
## This can take several hours depending on number of lines/cores used. 
screen_results <- GI_screen(
  control_id = ARID1A_control_id, 
  mutant_id = ARID1A_mutant_id,
  core_num = 5, # depends on how many cores you have  
  output_dir = gretta_output_dir, # Will save your results here as well as in the variable
  data_dir = gretta_data_dir,
  test = FALSE) # use TRUE to run a short test to make sure all will run overnight.

## Load prepared ARID1A screen result
load(paste0(gretta_data_dir,"/sample_22Q2_ARID1A_KO_screen.rda"), envir = environment())

We can quickly determine whether any lethal genetic interactions were predicted by GRETTA. We use a Pval cut off of 0.05 and rank based on the Interaction_score.

screen_results %>% 
  filter(Pval < 0.05) %>%
  arrange(-Interaction_score) %>% 
  select(GeneNames:Mutant_median, Pval, Interaction_score) %>% head

We immediately see that ARID1B, a known synthetic lethal interaction of ARID1A, was a the top of this list.

Optional: Performing a small-scale screen

To perform a small in silico screen, a list of genes can be provided in the gene_list = argument.

small_screen_results <- GI_screen(
  control_id = ARID1A_control_id, 
  mutant_id = ARID1A_mutant_id,
  gene_list = c("ARID1A", "ARID1B", "SMARCA2", "GAPDH", "SMARCC2"),
  core_num = 5, # depends on how many cores you have  
  output_dir = gretta_output_dir, # Will save your results here as well as in the variable
  data_dir = gretta_data_dir)

Visualize screen results

Finally once the in silico screen is complete, results can be quickly visualized using plot_screen(). Positive genetic interaction scores indicate potential synthetic lethal genetic interactors, and negative scores indicate potential alleviating genetic interactors. As expected, we identified ARID1B as a synthetic lethal interactor of ARID1A.

## Visualize results, turn on gene labels, 
## and label three genes each that are predicted to have 
## lethal and alleviating genetic interactions, respectively

plot_screen(result_df = screen_results, 
            label_genes = TRUE, 
            label_n = 3)

Example: Identifying ARID1A co-essential genes

Perturbing genes that function in same/synergistic pathways or in the same complex are said to show similar fitness effects, and these that show effects are considered to be "co-essential". The strategy of mapping co-essential gene have been used by several studies to attribute functions to previously annotated genes as well as to identify a novel subunit of a large complex (Wainberg et al. 2021; Pan et al. 2018).

Given that ARID1A is known subunit of the mammalian SWI/SNF complex (Mashtalir et al. 2018), we expect that members of the SWI/SNF complex would share co-essentiality with ARID1A. This example will demonstrate how we can map ARID1A's co-essential gene network using GRETTA.

Identifying genes with highest correlation coefficients

To determine co-essential genes, we will perform multiple Pearson correlation coefficient analyses between ARID1A KO effects and the KO effects of all 18,333 genes. A cut off will be determined by calculating the inflection point of the ranked coefficient curve. As expected find SWI/SNF subunit encoding genes, SMARCE1 and SMARCB1, as the top two co-essential genes.

Warning This process may take several minutes. Our example below coessential_map() + get_inflection_points() took ~17 minutes to process. To save time we have pre-processed this setp for you.

## Map co-essential genes
coess_df <- coessential_map(
  input_gene = "ARID1A", 
  data_dir = gretta_data_dir, 
  output_dir = gretta_output_dir) 

## Calculate inflection points of positive and negative curve using co-essential gene results.
coess_inflection_df <- get_inflection_points(coess_df)

# Load pre-processed results
load(paste0(gretta_data_dir,"/sample_22Q2_ARID1A_coessential_result.rda"), envir = environment())
load(paste0(gretta_data_dir,"/sample_22Q2_ARID1A_coessential_inflection.rda"), envir = environment())

Next, we annotate the data frame containing the co-essential network data and visualize.

## Combine and annotate data frame containing co-essential genes
coess_annotated_df <- annotate_df(coess_df, coess_inflection_df)

plot_coess(
  result_df = coess_annotated_df, 
  inflection_df = coess_inflection_df,
  label_genes = TRUE, # Should gene names be labeled?
  label_n = 3) # Number of genes to display from each end

We also see that the top ten ARID1A co-essential genes include eight known SWI/SNF subunits, namely ARID1A, SMARCB1, SMARCE1, SMARCC1, SS18, DPF2, SMARCC2, and SMARCD2.

## Show top 10 co-essential genes. 
coess_annotated_df %>% arrange(Rank) %>% head(10)

Optional filter for specific cancer types

Instead of mapping for essentiality across all available cell lines, users can also subset by disease type using the option input_disease = "", or within a pre-selected group of cell lines using the option input_cell_lines = c(). Below we provide an example of how ARID1A essential genes are mapped for pancreatic cancers.

Warning Depending on the number of cell lines that are available after the subsetting step, the inflection point calculation and thresholds may not be optimal. Please use caution when interpreting these results.

## Map co-essential genes in pancreatic cancers only
coess_df <- coessential_map(
  input_gene = "ARID1A",
  input_disease = "Pancreatic Cancer",
  core_num = 5, ## Depending on how many cores you have access to, increase this value to shorten processing time.
  data_dir = gretta_data_dir, 
  output_dir = gretta_output_dir,
  test = FALSE)

Optional filter for custom cell lines

We can also map essentiality across a manually defined list of cell lines using the input_cell_lines = c() option.

Warning Depending on the number of cell lines provided, the inflection point may not be calculated. Please use caution when interpreting these results.

custom_lines <- c("ACH-000001", "ACH-000002", "ACH-000003",...)

coess_df <- coessential_map(
  input_gene = "ARID1A",
  input_cell_lines = custom_lines,
  core_num = 5, ## Depending on how many cores you have access to, increase this value to shorten processing time.
  data_dir = gretta_data_dir, 
  output_dir = gretta_output_dir,
  test = FALSE)

Session information

sessionInfo()

ytakemon/GINIR documentation built on June 13, 2025, 12:40 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

ytakemon/GINIR
Genetic inteRaction and EssenTiality neTwork mApper

In ytakemon/GINIR: Genetic inteRaction and EssenTiality neTwork mApper

Introduction

Maintainer

Citations

Questions

Requirements

Installation

Additional DepMap versions

Workflows

Genetic interaction mapping

Co-essential network mapping

Example: Identifying ARID1A genetic interactions

Download example data

Exploring cell lines

Selecting mutant and control cell line groups

Optional cell line filters

Check for differential expression

Perform genome-wide in silico genetic screen

Optional: Performing a small-scale screen

Visualize screen results

Example: Identifying ARID1A co-essential genes

Identifying genes with highest correlation coefficients

Optional filter for specific cancer types

Optional filter for custom cell lines

Session information

R Package Documentation

Browse R Packages

We want your feedback!

ytakemon/GINIR Genetic inteRaction and EssenTiality neTwork mApper

In ytakemon/GINIR: Genetic inteRaction and EssenTiality neTwork mApper

Introduction

Maintainer

Citations

Questions

Requirements

Installation

Additional DepMap versions

Workflows

Genetic interaction mapping

Co-essential network mapping

Example: Identifying ARID1A genetic interactions

Download example data

Exploring cell lines

Selecting mutant and control cell line groups

Optional cell line filters

Check for differential expression

Perform genome-wide in silico genetic screen

Optional: Performing a small-scale screen

Visualize screen results

Example: Identifying ARID1A co-essential genes

Identifying genes with highest correlation coefficients

Optional filter for specific cancer types

Optional filter for custom cell lines

Session information

R Package Documentation

Browse R Packages

We want your feedback!

ytakemon/GINIR
Genetic inteRaction and EssenTiality neTwork mApper