This is a companion vignette briefly describing the advanced capabilites available for comparing, summarizing, and integrating screening contrasts with gCrisprTools, introduced with version 2.0.

Simplified Results Objects

Historically, gCrisprTools was focused on making results data.frames, which exhaustively summarize the results of a screen contrast. They look like this:

library(Biobase)
library(gCrisprTools)
knitr::opts_chunk$set(message = FALSE, fig.width = 8, fig.height = 8)

data(resultsDF)

head(resultsDF)

In many applications, there are limitations to these objects. For example:

To help with downstream analyses, we introduce the simpleResult object

res <- ct.simpleResult(resultsDF, collapse = 'geneSymbol')
head(res)

These objects are simpler representations of target signals that are applicable for "most" situations (e.g., when the gRNAs associated with a target have similar effects). They are:

This format enables better comparisons between screens, and we have some helper functions to support this:

# Make "another" result
res2 <- res
res2$best.p <- res2$best.p * runif(nrow(res2)) 
res2$direction[sample(1:nrow(res), 500)] <- res2$direction[sample(1:nrow(res), 500)]


regularized <- ct.regularizeContrasts(dflist = list('Experiment1' = res[1:1500,], 
                                                    'Experiment2' = res2[1:1900,]), 
                                      collapse = 'geneSymbol')

str(regularized)

The above function takes in a (named) list of results objects and creates a list of simpleResults with the same names that are "in register" with one another, using the shared elements specified by collapse.

The main utility of this function is twofold. First, it enables comparison of lightweight results objects that are derived from screens that could have been executed with different libraries, systems, or technologies by focusing on the targets and their directional significance measures. Second, this "named list of standardized results" can become a standard structure for encoding grouped contrasts for comparison purposes, allowing us to do more sophisticated comparisons between and across contrasts.

Testing Overlaps Between Two or More Screens

The standardized format allows straightforward identification of signal overlaps observed within multiple screens via ct.compareContrasts(). By default, these comparisons are done conditionally, but users may specify more rigid criteria.

comparison <- ct.compareContrasts(dflist = regularized,
                                  statistics = c('best.q', 'best.p'), 
                                  cutoffs = c(0.5,0.05), 
                                  same.dir = rep(TRUE, length(regularized)))

head(comparison, 30)

This function returns a simplifiedResult version of the mainresult argument, appended with a logical column indicating whether signals in the mainresult contrast passing the first significance cutoff are replicated in the validationresult contrast at the second significance cutoff. This sort of conditional scoring was shown previously to be useful in interpreting the results of validation and counterscreens.

For convenience, we can also return summary statistics characterizing the overlap between the screens:

ct.compareContrasts(dflist = regularized,
                    statistics = c('best.q', 'best.p'),
                    cutoffs = c(0.5,0.05), 
                    same.dir = rep(TRUE, length(regularized)),
                    return.stats = TRUE)

This can be useful when exploring appropriate cutoff thresholds, or when asking broader questions about overall congruence.

Visualization of Screen Enrichment and Depletion Dynamics

We provide a number of methods for visualizing and contextualizing the overall signals present within sets of screens. One of the simplest is the cointrast barcharts, which represent the number of observed enrichment and depletion signals in each screen according to specified significance criteria:

ct.contrastBarchart(regularized, background = FALSE, statistic = 'best.p')

In the above figure, each contrast is represented as a horizontal bar, and targets enriched and depleted are represented as bars extending to the right and the left of the vertical dotted line, respectively.

We can also compare the signals observed in two screens directly in a scatterplot:

scat <- ct.scatter(regularized, 
                   targets = 'geneSymbol', 
                   statistic = 'best.p', 
                   cutoff = 0.05)

This provides a scatter plot of the indicated statistic, with quadrants defined according to the user-specified cutoff. The number of targets in each of the quadrants is indicated in grey, and quadrants are keyed like this:

1 2 3 4 5 6 7 8 9

A more complete genewise picture can be achieved by examining the returned invisible object, which appends the relevant quadrants to assist in in focusing on particular targets of interest:

head(scat)

As an aside, this simplified infrastructure allows straightforward dissection of interaction effects as well. Often, it is helpful to identify targets that are enriched or depleted with respect to one contrast but inactive with respect to another (e.g., to identify genes that impact a screening phenotype but do not deplete over time).

library(Biobase)
library(limma)
library(gCrisprTools)

#Create a complex model design; removing the replicate term for clarity
data("es", package = "gCrisprTools")
data("ann", package = "gCrisprTools")

design <- model.matrix(~ 0 + TREATMENT_NAME, pData(es))
colnames(design) <- gsub('TREATMENT_NAME', '', colnames(design))
contrasts <-makeContrasts(ControlTime = ControlExpansion - ControlReference,
                          DeathOverTime = DeathExpansion - ControlReference,
                          Interaction = DeathExpansion - ControlExpansion, 
                          levels = design)

es <- ct.normalizeGuides(es, method = "scale") #See man page for other options
vm <- voom(exprs(es), design)

fit <- lmFit(vm, design)
fit <- contrasts.fit(fit, contrasts)
fit <- eBayes(fit)

allResults <- sapply(colnames(fit$coefficients), 
                     function(x){
                         ct.generateResults(fit,
                                            annotation = ann,
                                            RRAalphaCutoff = 0.1,
                                            permutations = 1000,
                                            scoring = "combined",
                                            permutation.seed = 2, 
                                            contrast.term = x)
                       }, simplify = FALSE)

allSimple <- ct.regularizeContrasts(allResults)

Using the logical columns we can Identify relevant sets of targets. For example, the number of targets changing over time in both timecourses:

time.effect <- ct.compareContrasts(list("con" = allSimple$ControlTime,
                                        "tx" = allSimple$DeathOverTime))
summary(time.effect$replicated)

or targets with a toxicity modifying effect that compromise intrinsic viability:

mod.control <- ct.compareContrasts(list("con" = allSimple$ControlTime,
                                        "Interaction" = allSimple$Interaction),
                                 same.dir = c(TRUE, FALSE))
summary(mod.control$replicated)

More Comparisons

Sometimes you might be curious about the relationship between many contrasts. You can accomplish this by making an UpSet plot:

upset <- ct.upSet(allSimple)

In addition to constructing the UpSet plot above, by default we include fold enrichment and p-value estimates to help interpret the various bars in the context of the nominal or conditional expected values.

Finally, the above function returns a combination matrix containing the overlap values and associated targets, which can be useful for interrogating intersection sets of interest.

show(upset)

See the documentation about combination matrices provided in the ComplexHeatmap package for accessor functions and additional information about the structure and use of this object.

Ontological Enrichment

Though not subject to specific set-level analyses yet, the ct.seas() function can be likewise extended to use lists of results. The standardized objects can then be consolidated to ask broader questions about enrichment and depletion using standard methods:

genesetdb <- sparrow::getMSigGeneSetDb(collection = 'h', species = 'human', id.type = 'entrez')

sparrowList <- ct.seas(allSimple, gdb = genesetdb)

show(sparrowList)

#Can use returned matrices to facilitate downstream comparisons: 

plot(-log10(sparrow::results(sparrowList$DeathOverTime, 'fgsea')$padj), 
     -log10(sparrow::results(sparrowList$Interaction, 'fgsea')$padj), 
     pch = 19, col = rgb(0,0,0.8,0.5), 
     ylab = "Pathway -log10(P), Treatment Over Time",
     xlab = "Pathway -log10(P), Marginal Time Effect, Treatment Arm",
     main = 'Evidence for Pathway Enrichment')
abline(0,1,lty = 2)
sessionInfo()


RussBainer/gCrisprTools documentation built on Nov. 5, 2022, 2:35 p.m.