degrep | R Documentation |
degrep is a variant of getdeg that utilizes confidence measures of rate
ratios to find the "best guide". First, the median rate ratio of a group
(e.g. a gene) is determined. The best guide has the most extreme rate ratio
with the same sign (direction) as the median, after moving two (or another
specified number) standard (error) units toward null. See details below. For
more context, see getdeg
degrep( guides, r0, se0, r1, se1, rt = FALSE, set = FALSE, a, b, hnull = 0, nse = 2, secondbest = TRUE, correctab = TRUE )
guides |
Character vector. Guides are assumed to start with the gene name, followed by an underscore, followed by a number or sequence unique within that gene. |
r0 |
Numeric vector. Log2-transformed rate ratios of features representing straight lethality. |
se0 |
Numeric vector. Standard errors corresponding to r0. |
r1 |
Numeric vector. Log2-transformed rate ratios of features representing sensitization or synthetic lethality. Optional but required to calculate e. |
se1 |
Numeric vector. Standard errors corresponding to r1. |
rt |
Numeric vector. Log2-transformed rate ratios of features representing lethality in the test sample. Optional. |
set |
Numeric vector. Standard errors corresponding to rt. |
a |
Numeric. Estimated potential population doublings between time points. |
b |
Numeric. Estimated potential population doublings between time points in test sample. Only applicable if r1 is given. If omitted, assumed equal to a. |
hnull |
Numeric. Null hypothesis. Growth effects of genes are tested to be more extreme than this value. Setting hnull can greatly improve the usefulness of p-values, and can be considered a cutoff for relevance. Default = 0 |
nse |
Numeric. Number of standard units, used for comparing guides. See details below. Default = 2 |
secondbest |
Logical. If TRUE, calculate effect sizes based on the second best guides of each gene as well. Default = TRUE |
correctab |
Logical. When |
For more details on basic functionality, see getdeg
documentation. The added functionality in this function hinges on the use
of confidence measures of rate ratios. Rate ratios and associated errors
can be derived from other sources, but I recommend using the output of
rrep
. As in getdeg
, a single best guide is
determined using all available data. But now, the confidence given by the
standard error is used to help select the best guide. Here is an example to
illustrate. Say guide 1 has a rate ratio of -4 and a standard error of 1.2,
while guide 2 targeting the same gene has a rate ratio of -3 and a standard
error of 0.5 and guide 3 and guide 4 both have little effect. When using
the default nse = 2
, guide 2 scores better than guide 1, and is thus
designated "best guide". However, for the p-value calculation, the lowest
p-value is reported, which is calculated using the rate ratio, its standard
error, and the null hypothesis as determined by hnull
and the number
of population doublings. The p-value reported for a gene does therefore not
necessarily match to the best guide, and can in fact below to an outlier.
The p-values are also not corrected for multiple testing. Instead, p-values
can easily be calculated for all guides using 2*pnorm(-abs(r)/se)
or
pnorm((hnull*a-abs(r))/se)
, and corrected for multiple testing using
p.adjust
.
Returns a list with the following (depending on input arguments):
genes - list of all gene symbols
n - number of guides representing the gene
d - gene knockout effects on straight lethality
d2 - gene knockout effects on straight lethality based on the second-best guide
e - gene knockout effects on sensitization
e2 - gene knockout effects on sensitization based on the second-best guide
de - gene knockout effects on straight lethality in the test arm
de2 - gene knockout effects on straight lethality in the test arm based on the second-best guide
g - estimated guide efficacy
i - within-gene index of the best guide
j - within-gene index of the second-best guide
pd - p-value of straight lethality
pe - p-value of sensitization
pde - p-value of lethality in the test arm
With these analyses, it is important to visually inspect all steps, and preferentially to analyze a data set with several settings.
Jos B. Poell
getdeg
, rrep
ut1 <- CRISPRsim(1000, 4, a = c(3,3), allseed = 100, t0seed = 10, repseed = 1, perfectseq = TRUE) tr1 <- CRISPRsim(1000, 4, a = c(3,3), e = TRUE, allseed = 100, t0seed = 10, repseed = 2, perfectseq = TRUE) ut2 <- CRISPRsim(1000, 4, a = c(3,3), allseed = 100, t0seed = 20, repseed = 3, perfectseq = TRUE) tr2 <- CRISPRsim(1000, 4, a = c(3,3), e = TRUE, allseed = 100, t0seed = 20, repseed = 4, perfectseq = TRUE) ut3 <- CRISPRsim(1000, 4, a = c(3,3), allseed = 100, t0seed = 30, repseed = 5, perfectseq = TRUE) tr3 <- CRISPRsim(1000, 4, a = c(3,3), e = TRUE, allseed = 100, t0seed = 30, repseed = 6, perfectseq = TRUE) cgi <- tr1$d > -0.05 & tr1$d < 0.025 & tr1$e > -0.05 & tr1$e < 0.025 rr0 <- rrep(cbind(ut1$t6, ut2$t6, ut3$t6), cbind(ut1$t0, ut2$t0, ut3$t0), normsubset = cgi) rr1 <- rrep(cbind(tr1$t6, tr2$t6, tr3$t6), cbind(ut1$t6, ut2$t6, ut3$t6), normsubset = cgi) deg <- degrep(ut1$guides, rr0$r, rr0$se, rr1$r, rr1$se, a = 6, b = 6, secondbest = FALSE) reald <- rle(tr1$d)$values reale <- rle(tr1$e)$values plot(reald, deg$d) plot(reale, deg$e)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.