Analysis: Analysis of pooled CRISPR screening data using a Wilcoxon Test

Description

__Wilcox__

Within this approach, the read counts of all sgRNAs in one dataset are first normalized by the function set in the MIACCS file. By default, normalization is done by read count division with the dataset median. Then, the fold change of each population of sgRNAs for a gene is tested against the population of either the non-targeting controls or randomly picked sgRNAs, as defined by the random picks option within the MIACCS file, using a two-sided Mann-Whitney-U test. P-values are corrected for multiple testing using FDR.

Usage

1
2
3
stat.wilcox(untreated.list=list(NULL, NULL),treated.list=list(NULL, NULL),
namecolumn=1, fullmatchcolumn=2,normalize=TRUE,norm.fun=median,
extractpattern=expression("^(.+?)_.+"), controls=NULL, control.picks=300, sorting=TRUE)

Arguments

untreated.list

A list of data.frames of untreated, control samples. e.g. list(df.control1, df.control2)

treated.list

A list of data.frames of treated samples. e.g. list(df.treated1, df.treated2)

namecolumn

In which the target names are located, e.g. namecolumn=1 for the first columns.

fullmatchcolumn

Column, in which readcounts are located, e.g. fullmatchcolumn=2 for the second column.

normalize

Datasets can be normalized by norm.fun if normalize=TRUE.

norm.fun

The function used to normalize the datasets if normalize=TRUE. By default, normalization is done using the dataset median, but any other function e.g. mean, can be used in principle.

extractpattern

Regular Expression, used to extract the gene name from the sgRNA name. Please make sure that the gene name extracted is accesible by putting its regular expression in brackets (). The default value expression("^(.+?)_.+") will look for the gene name (.+?) in front of the separator _ and any character afterwards .+ e.g. gene1_anything .

controls

DSS requires a set of non-targeting sgRNAs (negative controls) within the datasets. You can specify the arbitrary gene name for these controls using controls="arbitrary.gene.name.of.controls".

sorting

Analysis output is by default sorted by gene name (sorting=FALSE). If desired, the output table can be sorted according to the p-value of the genes (sorting=TRUE).

control.picks

If no non-targeting controls are present or set, wilcox will pick a randum number of sgRNAs from the data set as the alternative population. This is only used if 'controls=NULL'. *Default* 300 *Values* numeric

Value

stat.wilcox return a data.frame, which can be visualized by plot.hitident. The data.frame has the following format:

untreated treated foldchange p.value
AAK1 2.061346 3.007924 1.351672 0.2966311
AATK 3.413357 5.129985 1.398695 0.1146190
ABI1 2.997385 4.384881 1.418959 0.1437962
ABL1 2.269906 2.874087 1.211499 0.3681327
ABL2 2.519391 4.539583 1.732575 0.6335575

For each gene, the foldchange as well as the p-value, derived by the Mann-Whitney U test against the non-targeting controls, are listed.

Note

none

Author(s)

Jan Winter

Examples

1
2
3
4
5
6
7
8
data(caRpools)

data.wilcox = stat.wilcox(untreated.list = list(CONTROL1, CONTROL2),
  treated.list = list(TREAT1,TREAT2), namecolumn=1, fullmatchcolumn=2,
  normalize=TRUE, norm.fun=median, sorting=FALSE, controls="random",
  control.picks=NULL)
  
knitr::kable(data.wilcox[1:10,])

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.