In HenryWard/orthrus: Scoring Combinatorial CRISPR Screens

knitr::opts_chunk$set(
  collapse = TRUE,
  eval = FALSE,
  comment = "#>"
)

Introduction

The Orthrus package contains all the computational tools you need to process, score and analyze combinatorial CRISPR screening data.

This document will guide you through the process of scoring a published combinatorial screening dataset. Key features of this dataset are summarized below, and in-depth descriptions of this dataset and how it was originally scored are available in Gonatopoulos-Pournatzis et al.

Orthrus protocol

A more detailed walkthrough of how to apply Orthrus and analyze combinatorial screening data is forthcoming in a separate manuscript.

Scoring interfaces

Orthrus offers three scoring interfaces: manual, batch and wrapper.

The manual interface affords fine-grained control over all aspects of data processing and scoring, but because it is relatively verbose and roughly equivalent to the batch interface for most steps, it is not recommended for most applications. This interface is detailed in the protocol linked above.
The batch interface requires users to specify two input tables in addition to a reads file before running scoring. This interface is recommended for most experiments and is demonstrated below.
The wrapper interface can run the entire package with a single function call. However, this is only recommended for experienced users, as for most applications the batch interface is more conducive for re-running aspects of the scoring process with updated parameters (e.g. to change FDR thresholds). This interface is detailed in the protocol linked above.

Data description

Six combinatorial CRISPR screens across HAP1 and RPE1 cell lines
HAP1 screens include wild-type T12 and T18 data, and cells treated with the mTOR inhibitor Torin1 and a Torin1
RPE1 screens include wild-type T18 and T24
HAP1 and RPE1 T0 replicates for computing log fold-changes
CHyMErA library detailed in Gonatopoulos-Pournatzis et al. applied to all screens
Library contains dual-targeting guides that target the same gene twice, combinatorial-targeting guides that target paralog gene pairs, and single-targeting guides that target an exonic region and an intergenic region

Important publications

Please refer to the following publications for more information on the CHyMErA experimental platform, CRISPR screens and scoring them, or alternative approaches for scoring combinatorial CRISPR screening data.

Prerequisites

To follow this vignette, familiarity with CRISPR screening technology is strongly recommended. Familiarity with combinatorial CRISPR screening platforms or other ways to score CRISPR screening data is recommended, but not required.

Walkthrough

Setting up

Install Orthrus and its dependencies if necessary.

# Installs CRAN packages
install.packages("ggplot2")
install.packages("ggthemes")
install.packages("pheatmap")
install.packages("PRROC")
install.packages("RColorBrewer")

# Installs the limma package from Bioconductor
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("limma")

# Installs Orthrus from Github
library(devtools)
devtools::install_github("HenryWard/orthrus")

Load packages.

library(orthrus)

Create output folders.

# Renames dataset
df <- chymera_paralog

# Makes output folders
output_folder <- file.path("vignette_output")
plot_folder <- file.path(output_folder, "scored")
qc_folder <- file.path(output_folder, "qc")
lfc_folder <- file.path(qc_folder, "lfc")
if (!dir.exists(output_folder)) { dir.create(output_folder, recursive = TRUE) }
if (!dir.exists(plot_folder)) { dir.create(plot_folder) }
if (!dir.exists(qc_folder)) { dir.create(qc_folder) }
if (!dir.exists(lfc_folder)) { dir.create(lfc_folder) }

Call the add_screens_from_table function to build up a list of screens with names and corresponding technical replicates, starting with T0 replicates.

screens <- add_screens_from_table(chymera_sample_table)

Processing and QC

The first thing we want to do is make quality-control plots for raw read counts with the function plot_reads_qc. Output these to the previously-created QC folder.

plot_reads_qc(df, screens, qc_folder)

Now we need to normalize each screen in three different ways:

To their respective T0 screens by computing log fold-changes (LFCs)
To the respective depth of each technical replicate
By removing guides that are too lowly or highly-expressed at T0

The function normalize_screens automatically performs all of these normalization steps. The function infers which columns of df need to be normalized to which T0 screens based on the normalize_name parameter of each screen in screens (screens without this optional parameter will not be normalized to other screens). Log-scaling and depth-normalization is performed on each screen regardless of the normalize_name parameter. For example, after normalization T0 columns in df will contain log-scaled, depth-normalized read counts, whereas columns from later timepoints will contain depth-normalized LFCs compared to their respective T0s.

df <- normalize_screens(df, screens, filter_names = c("HAP1_T0", "RPE1_T0"), min_reads = 30)

Make detailed QC plots for LFC data by calling the plot_lfc_qc function, specifying the gene names of any negative control guides with the negative_controls parameter.

plot_lfc_qc(df, screens, qc_folder, display_numbers = FALSE, plot_type = "pdf", negative_controls = c("NT"))

The last thing we need to do before scoring data is parse it into a different structure and split guides by their type, since we score dual-targeting guides separately from combinatorial-targeting guides.

guides <- split_guides(df, screens, "Cas9.Guide", "Cpf1.Guide")
dual <- guides[["dual"]]
single <- guides[["single"]]
paralogs <- guides[["combn"]]

Batch scoring

To score data with the batch scoring interface, we call the score_conditions_batch and score_combn_batch functions separately.

batch_table <- chymera_batch_table
score_conditions_batch(dual, screens, batch_table, output_folder, 
                       test = "moderated-t", loess = TRUE, filter_genes = c("NT"),
                       neg_type = "Sensitizer", pos_type = "Suppressor",
                       fdr_threshold = 0.1, differential_threshold = 0.5,
                       plot_type = "pdf")
score_combn_batch(paralogs, single, screens, batch_table, output_folder, 
                  test = "moderated-t", loess = TRUE, filter_genes = c("NT"),
                  neg_type = "Sensitizer", pos_type = "Suppressor",
                  fdr_threshold = 0.2, differential_threshold = 0.5,
                  plot_type = "pdf")

Summary

This concludes a brief walkthrough of how to use Orthrus to score combinatorial CRISPR screening data. However, we left out arguably the most important step: manual sanity-checking and analysis of individual output plots, metrics and scored data. We strongly advise that users check all output files to investigate their data's quality, and revise data processing or analysis steps accordingly (e.g. filtering out T0 reads more strictly, tightening differential effect and FDR thresholds, manually removing problematic guides, ensuring that positive controls are called as significant hits). The accompanying protocol contains guidance for interpreting Orthrus' various output files and suggestions for how to change certain parameters.

HenryWard/orthrus documentation built on June 2, 2023, 10:28 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com