This is a companion vignette briefly describing the advanced capabilites available for comparing, summarizing, and integrating screening contrasts with gCrisprTools, introduced with version 2.0.
Historically, gCrisprTools was focused on making results data.frames, which exhaustively summarize the results of a screen contrast. They look like this:
library(Biobase) library(gCrisprTools) knitr::opts_chunk$set(message = FALSE, fig.width = 8, fig.height = 8) data(resultsDF) head(resultsDF)
In many applications, there are limitations to these objects. For example:
colnames
are human-readable instead of machine readablegeneID
and geneSymbol
is not clearTo help with downstream analyses, we introduce the simpleResult
object
res <- ct.simpleResult(resultsDF, collapse = 'geneSymbol') head(res)
These objects are simpler representations of target signals that are applicable for "most" situations (e.g., when the gRNAs associated with a target have similar effects). They are:
This format enables better comparisons between screens, and we have some helper functions to support this:
# Make "another" result res2 <- res res2$best.p <- res2$best.p * runif(nrow(res2)) res2$direction[sample(1:nrow(res), 500)] <- res2$direction[sample(1:nrow(res), 500)] regularized <- ct.regularizeContrasts(dflist = list('Experiment1' = res[1:1500,], 'Experiment2' = res2[1:1900,]), collapse = 'geneSymbol') str(regularized)
The above function takes in a (named) list of results objects and creates a list of simpleResult
s with the same names that are "in register" with one another, using the shared elements specified by collapse
.
The main utility of this function is twofold. First, it enables comparison of lightweight results objects that are derived from screens that could have been executed with different libraries, systems, or technologies by focusing on the targets and their directional significance measures. Second, this "named list of standardized results" can become a standard structure for encoding grouped contrasts for comparison purposes, allowing us to do more sophisticated comparisons between and across contrasts.
The standardized format allows straightforward identification of signal overlaps observed within multiple screens via ct.compareContrasts()
. By default, these comparisons are done conditionally, but users may specify more rigid criteria.
comparison <- ct.compareContrasts(dflist = regularized, statistics = c('best.q', 'best.p'), cutoffs = c(0.5,0.05), same.dir = rep(TRUE, length(regularized))) head(comparison, 30)
This function returns a simplifiedResult
version of the mainresult
argument, appended with a logical column indicating whether signals in the mainresult
contrast passing the first significance cutoff are replicated in the validationresult
contrast at the second significance cutoff. This sort of conditional scoring was shown previously to be useful in interpreting the results of validation and counterscreens.
For convenience, we can also return summary statistics characterizing the overlap between the screens:
ct.compareContrasts(dflist = regularized, statistics = c('best.q', 'best.p'), cutoffs = c(0.5,0.05), same.dir = rep(TRUE, length(regularized)), return.stats = TRUE)
This can be useful when exploring appropriate cutoff thresholds, or when asking broader questions about overall congruence.
We provide a number of methods for visualizing and contextualizing the overall signals present within sets of screens. One of the simplest is the cointrast barcharts, which represent the number of observed enrichment and depletion signals in each screen according to specified significance criteria:
ct.contrastBarchart(regularized, background = FALSE, statistic = 'best.p')
In the above figure, each contrast is represented as a horizontal bar, and targets enriched and depleted are represented as bars extending to the right and the left of the vertical dotted line, respectively.
We can also compare the signals observed in two screens directly in a scatterplot:
scat <- ct.scatter(regularized, targets = 'geneSymbol', statistic = 'best.p', cutoff = 0.05)
This provides a scatter plot of the indicated statistic, with quadrants defined according to the user-specified cutoff. The number of targets in each of the quadrants is indicated in grey, and quadrants are keyed like this:
1 2 3 4 5 6 7 8 9
A more complete genewise picture can be achieved by examining the returned invisible object, which appends the relevant quadrants to assist in in focusing on particular targets of interest:
head(scat)
As an aside, this simplified infrastructure allows straightforward dissection of interaction effects as well. Often, it is helpful to identify targets that are enriched or depleted with respect to one contrast but inactive with respect to another (e.g., to identify genes that impact a screening phenotype but do not deplete over time).
library(Biobase) library(limma) library(gCrisprTools) #Create a complex model design; removing the replicate term for clarity data("es", package = "gCrisprTools") data("ann", package = "gCrisprTools") design <- model.matrix(~ 0 + TREATMENT_NAME, pData(es)) colnames(design) <- gsub('TREATMENT_NAME', '', colnames(design)) contrasts <-makeContrasts(ControlTime = ControlExpansion - ControlReference, DeathOverTime = DeathExpansion - ControlReference, Interaction = DeathExpansion - ControlExpansion, levels = design) es <- ct.normalizeGuides(es, method = "scale") #See man page for other options vm <- voom(exprs(es), design) fit <- lmFit(vm, design) fit <- contrasts.fit(fit, contrasts) fit <- eBayes(fit) allResults <- sapply(colnames(fit$coefficients), function(x){ ct.generateResults(fit, annotation = ann, RRAalphaCutoff = 0.1, permutations = 1000, scoring = "combined", permutation.seed = 2, contrast.term = x) }, simplify = FALSE) allSimple <- ct.regularizeContrasts(allResults)
Using the logical columns we can Identify relevant sets of targets. For example, the number of targets changing over time in both timecourses:
time.effect <- ct.compareContrasts(list("con" = allSimple$ControlTime, "tx" = allSimple$DeathOverTime)) summary(time.effect$replicated)
or targets with a toxicity modifying effect that compromise intrinsic viability:
mod.control <- ct.compareContrasts(list("con" = allSimple$ControlTime, "Interaction" = allSimple$Interaction), same.dir = c(TRUE, FALSE)) summary(mod.control$replicated)
Sometimes you might be curious about the relationship between many contrasts. You can accomplish this by making an UpSet plot:
upset <- ct.upSet(allSimple)
In addition to constructing the UpSet plot above, by default we include fold enrichment and p-value estimates to help interpret the various bars in the context of the nominal or conditional expected values.
Finally, the above function returns a combination matrix containing the overlap values and associated targets, which can be useful for interrogating intersection sets of interest.
show(upset)
See the documentation about combination matrices provided in the ComplexHeatmap
package for accessor functions and additional information about the structure and use of this object.
Though not subject to specific set-level analyses yet, the ct.seas()
function can be likewise extended to use lists of results. The standardized objects can then be consolidated to ask broader questions about enrichment and depletion using standard methods:
genesetdb <- sparrow::getMSigGeneSetDb(collection = 'h', species = 'human', id.type = 'entrez') sparrowList <- ct.seas(allSimple, gdb = genesetdb) show(sparrowList) #Can use returned matrices to facilitate downstream comparisons: plot(-log10(sparrow::results(sparrowList$DeathOverTime, 'fgsea')$padj), -log10(sparrow::results(sparrowList$Interaction, 'fgsea')$padj), pch = 19, col = rgb(0,0,0.8,0.5), ylab = "Pathway -log10(P), Treatment Over Time", xlab = "Pathway -log10(P), Marginal Time Effect, Treatment Arm", main = 'Evidence for Pathway Enrichment') abline(0,1,lty = 2)
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.