degrep: Derive growth-modifying effect of gene knockout in pooled...
In tgac-vumc/CSSA: Analysis and Simulation Tools for CRISPR-Cas9 Pooled Screens

degrep

R Documentation

Derive growth-modifying effect of gene knockout in pooled experiments with replicate arms

Description

degrep is a variant of getdeg that utilizes confidence measures of rate ratios to find the "best guide". First, the median rate ratio of a group (e.g. a gene) is determined. The best guide has the most extreme rate ratio with the same sign (direction) as the median, after moving two (or another specified number) standard (error) units toward null. See details below. For more context, see getdeg

Usage

degrep(
  guides,
  r0,
  se0,
  r1,
  se1,
  rt = FALSE,
  set = FALSE,
  a,
  b,
  hnull = 0,
  nse = 2,
  secondbest = TRUE,
  correctab = TRUE
)

Arguments

`guides`	Character vector. Guides are assumed to start with the gene name, followed by an underscore, followed by a number or sequence unique within that gene.
`r0`	Numeric vector. Log2-transformed rate ratios of features representing straight lethality.
`se0`	Numeric vector. Standard errors corresponding to r0.
`r1`	Numeric vector. Log2-transformed rate ratios of features representing sensitization or synthetic lethality. Optional but required to calculate e.
`se1`	Numeric vector. Standard errors corresponding to r1.
`rt`	Numeric vector. Log2-transformed rate ratios of features representing lethality in the test sample. Optional.
`set`	Numeric vector. Standard errors corresponding to rt.
`a`	Numeric. Estimated potential population doublings between time points.
`b`	Numeric. Estimated potential population doublings between time points in test sample. Only applicable if r1 is given. If omitted, assumed equal to a.
`hnull`	Numeric. Null hypothesis. Growth effects of genes are tested to be more extreme than this value. Setting hnull can greatly improve the usefulness of p-values, and can be considered a cutoff for relevance. Default = 0
`nse`	Numeric. Number of standard units, used for comparing guides. See details below. Default = 2
`secondbest`	Logical. If TRUE, calculate effect sizes based on the second best guides of each gene as well. Default = TRUE
`correctab`	Logical. When `a != b`, it is be possible (and necessary?) to mathematically correct for this difference. If you analyze an experiment with unequal a and b, try both with and without correction. Default = TRUE

Details

For more details on basic functionality, see getdeg documentation. The added functionality in this function hinges on the use of confidence measures of rate ratios. Rate ratios and associated errors can be derived from other sources, but I recommend using the output of rrep. As in getdeg, a single best guide is determined using all available data. But now, the confidence given by the standard error is used to help select the best guide. Here is an example to illustrate. Say guide 1 has a rate ratio of -4 and a standard error of 1.2, while guide 2 targeting the same gene has a rate ratio of -3 and a standard error of 0.5 and guide 3 and guide 4 both have little effect. When using the default nse = 2, guide 2 scores better than guide 1, and is thus designated "best guide". However, for the p-value calculation, the lowest p-value is reported, which is calculated using the rate ratio, its standard error, and the null hypothesis as determined by hnull and the number of population doublings. The p-value reported for a gene does therefore not necessarily match to the best guide, and can in fact below to an outlier. The p-values are also not corrected for multiple testing. Instead, p-values can easily be calculated for all guides using 2*pnorm(-abs(r)/se) or pnorm((hnull*a-abs(r))/se), and corrected for multiple testing using p.adjust.

Value

Returns a list with the following (depending on input arguments):

genes - list of all gene symbols
n - number of guides representing the gene
d - gene knockout effects on straight lethality
d2 - gene knockout effects on straight lethality based on the second-best guide
e - gene knockout effects on sensitization
e2 - gene knockout effects on sensitization based on the second-best guide
de - gene knockout effects on straight lethality in the test arm
de2 - gene knockout effects on straight lethality in the test arm based on the second-best guide
g - estimated guide efficacy
i - within-gene index of the best guide
j - within-gene index of the second-best guide
pd - p-value of straight lethality
pe - p-value of sensitization
pde - p-value of lethality in the test arm

Note

With these analyses, it is important to visually inspect all steps, and preferentially to analyze a data set with several settings.

Author(s)

Jos B. Poell

Examples

ut1 <- CRISPRsim(1000, 4, a = c(3,3), allseed = 100, t0seed = 10, 
                 repseed = 1, perfectseq = TRUE)
tr1 <- CRISPRsim(1000, 4, a = c(3,3), e = TRUE, allseed = 100, t0seed = 10, 
                 repseed = 2, perfectseq = TRUE)
ut2 <- CRISPRsim(1000, 4, a = c(3,3), allseed = 100, t0seed = 20, 
                 repseed = 3, perfectseq = TRUE)
tr2 <- CRISPRsim(1000, 4, a = c(3,3), e = TRUE, allseed = 100, t0seed = 20, 
                 repseed = 4, perfectseq = TRUE)
ut3 <- CRISPRsim(1000, 4, a = c(3,3), allseed = 100, t0seed = 30, 
                 repseed = 5, perfectseq = TRUE)
tr3 <- CRISPRsim(1000, 4, a = c(3,3), e = TRUE, allseed = 100, t0seed = 30, 
                 repseed = 6, perfectseq = TRUE)
cgi <- tr1$d > -0.05 & tr1$d < 0.025 & tr1$e > -0.05 & tr1$e < 0.025
rr0 <- rrep(cbind(ut1$t6, ut2$t6, ut3$t6), cbind(ut1$t0, ut2$t0, ut3$t0), normsubset = cgi)
rr1 <- rrep(cbind(tr1$t6, tr2$t6, tr3$t6), cbind(ut1$t6, ut2$t6, ut3$t6), normsubset = cgi)
deg <- degrep(ut1$guides, rr0$r, rr0$se, rr1$r, rr1$se, a = 6, b = 6, secondbest = FALSE)
reald <- rle(tr1$d)$values
reale <- rle(tr1$e)$values
plot(reald, deg$d)
plot(reale, deg$e)

tgac-vumc/CSSA documentation built on Oct. 10, 2022, 7:27 p.m.