In OscarBrock/gCrisprTools: Suite of Functions for Pooled Crispr Screen QC and Analysis

This is a report detailing experiment-level properties of a pooled CRISPR screening experiment. For information about individual contrasts, please consult the dedicated contrast report.

Section 1: Experiment Overview

gRNA Alignment

In a high-quality library the overwhelming majority of reads will contain valid gRNA target sequences (>80%). Reads not matching the gRNA library may correspond to experimental contamination by lentivirus derived from other libraries. Rejected reads correspond to sequences not derived from the lentiviral cassette and may reflect problems with cassette isolation, library prep, or sequencing quality. Reads that "double match" contain apparent gRNA sequences in both mate pairs of a single paired-end read; this shouldn't happen given the structure of the lentiviral cassette, but may be rarely produced as an artifact of the library construction process.

if(!is.null(params$aln)){
  ct.alignmentChart(aln = params$aln, sampleKey = params$sampleKey)
} else {
  print("No alignment matrix provided.")
}

Raw gRNA Read Counts

In most CRISPR libraries the majority of gRNA abundances will form a relatively smooth distribution about the median count, often with a long tail. Sample libraries may be undersequenced if the raw number of read counts per gRNA is low, and PCR artifacts may be present if the sample contains a subset of extremely abundant guides, or if the distribution is jagged or contains multiple modes. In most well-executed experiments the majority of gRNAs in the control samples will form a tight distribution around some reasonably high average read count (e.g., hundreds of reads, X axis). Excessively low raw count values can compromise normalization steps and subsequent estimation of gRNA levels, which is typically a problem in screens where identification of disproportionately depleted gRNAs is important.

ct.rawCountDensities(eset = params$eset, sampleKey = params$sampleKey)

The Effect of Filtering on Read Density

Most pooled CRISPR screening experiments include gRNA cassettes whose abundance is extremely low, either because of low-level sample contamination from other libraries, because certain constructs influenbce the lentivirus' infection efficiency, or because the corresponding cassettes are simply lowly abundant in the original plasmid or lentiviral libraries. If a very large number of lowly abundant gRNAs are present in a screen, they can distort the overall gRNA distributions and destabilize normalization procedures. In the following images, the trimmed gRNA distributions are shown for each sample; after trimming, the abundances in each sample should form a gaussian-like distribution centered about the sample median. If some of the samples, and especially the control samples, are not of the expected shape it may be necessary to rerun the filtering algorithm with different parameter choices (see ct.filterReads() manual page for details). Note that the subsequent analyses reported in this document are conducted after low-abundance gRNAs have been omitted during this step.

filt.eset <- ct.filterReads(
  eset = params$eset,
  trim = params$trim,
  log2.ratio = params$log2.ratio,
  sampleKey = params$sampleKey,
  plot.it = TRUE
)

The Effect of Normalization on Read Density

The $\alpha$-RRA algorithm estimates the significance of a target's enrichment or depletion on the basis of the experiment-level ranks of the associated gRNA-level signals. Proper normalization can in principle improve the accuracy of the algorithm by eliminating sample-level biases that are likely to reflect technical artifacts rather than biological signals. In many screening experiments, the gRNA abundances in each sample should form a smooth distribution centered about the same median after normalization, and the overall distributions should be similar within replicate groups.

There is not currently a strong literature regarding normalization in pooled screening experiments, however, and to some extent the definition of successful normalization is linked to experimental expectations about the composition of the screening library and the outcome of the screen. The images below are provided to inform users about the effects of two major normalizing transformations on the sample count distributions to inform downstream processing decisions, but the specific choices are left up to the user. Consequently, the normalized data are not used elsewhere in this document.

scale.eset <- ct.normalizeGuides(
  eset = filt.eset,
  method = 'scale',
  annotation = params$annotation,
  sampleKey = params$sampleKey,
  lib.size = params$lib.size,
  plot.it = TRUE
)

slope.eset <- ct.normalizeGuides(
  eset = filt.eset,
  method = 'slope',
  annotation = params$annotation,
  sampleKey = params$sampleKey,
  lib.size = params$lib.size,
  plot.it = TRUE
)

GC Content and gRNA Abundance

GC content can influence PCR efficiency, and a strong relationship between GC content and gRNA abundance may be evidence of poor viral library quality or sample preparation in CRISPR screens. Ideally, there should be no clear relationship between GC content and a gRNA's measured abundance. Keep in mind that some libraries may contain important subsets of gRNAs that have systematic differences in GC content (e.g., nontargeting controls) which may affect this.

if (is.null(params$sampleKey)) {
  ct.GCbias(data.obj = filt.eset, 
            ann = params$annotation, 
            lib.size = params$lib.size
            )
} else {
   ct.GCbias(data.obj = filt.eset, 
            ann = params$annotation, 
            lib.size = params$lib.size,
            sampleKey = params$sampleKey
            )
}

Section 2: Library-level Distortion of gRNA Abundances

Pooled CRISPR screens are fundamentally dynamic experiments, in which the measured abundance of a particular gRNA is fundamentally dependent on the scale of selection observed throughout the rest of the screen. In general, rigorous statistical methods perform well in screens where most cells survive the selection event and the gRNA distributions remain relatively stable throughout the experiment (i.e., dropout screens); more heuristic approaches are preferable in screens where most cells are affected and the resulting library only contains a small subset of gRNAs of extremely high abundance. The primary goal of this section is to determine the extent to which selection has distorted your gRNA library to guide downstream analyses.

Ranked and Scaled gRNA Abundance Distributions

Strong selection in a CRISPR screen will distort the gRNAs such that the "slope" of the middle of the distribution changes relative to what is observed in the control samples. This can lead to spurious inferences of gRNA depletion in some cases, and it may be useful to correct this with the ct.normalizeBySlope() function in the gCrisprTools R package prior to downstream analyses. Alternatively, experiments involving treatments where most cells are affected may have failed if library-level distortion is not readily apparent. If provided, the locations of guides associated with a specified geneSymbol are indicated as diamonds along each sample distribution (typically nontargeting controls).

ct.gRNARankByReplicate(
  eset = filt.eset, 
  sampleKey = params$sampleKey, 
  annotation = params$annotation, 
  geneSymb = params$geneSymb,
  lib.size = params$lib.size
)

Cumulative Target-level Read Abundance

A simple way to quantify the extent of distortion in a CRISPR screen is to observe the proportion of reads within each library that are derived from gRNA cassettes targeting each genomic element. By ranking the targets by their abundance within the library (X axis) and plotting the cumulative proportion of reads that the top N targets represent (Y axis), we can tell if the treatment libraries essentially only contain cassettes targeting a small number of genes, or if most of the cells survive the selection step.

ct.guideCDF(
  eset = filt.eset,
  sampleKey = params$sampleKey,
  plotType = "Target",
  annotation = params$annotation
)

Nontargeting Control Guides

The behavior of nontargeting control guides can give some sense of the consistency of the gRNA abundances within experimental cultures and the likely effect of normalization on the gRNA abundance estimation. Note that it is not necessarily critical that the relative abundances of nontargeting gRNAs remain superficially identical across treatments, but extremely large changes may indicate that individual gRNA levels are not likely to be well estimated.

tryCatch(
  ct.viewControls(
    eset = filt.eset,
    annotation = params$annotation,
    geneSymb = params$geneSymb,
    sampleKey = params$sampleKey,
    lib.size = params$lib.size,
    normalize = FALSE
  ),
  error = function(e) {
    paste0("Failed to generate control plot: \n", 
           e$message)
  }
)
tryCatch(
  ct.viewControls(
    eset = filt.eset,
    annotation = params$annotation,
    geneSymb = params$geneSymb,
    sampleKey = params$sampleKey,
    normalize = TRUE, 
    lib.size = params$lib.size

  ),
  error = function(e) {
    paste0("Failed to generate control plot: \n", 
           e$message)
  }
)