stat.DESeq: Analysis: DESeq2 Analysis of pooled CRISPR NGS data
In caRpools: CRISPR AnalyzeR for Pooled CRISPR Screens

Description Usage Arguments Details Value Note Author(s) Examples

For the DESeq2 analysis implementation, the read counts of all sgRNAs for a given gene are first summed up to increase the available read count. Then, DESeq2 analysis is perfomed, which includes the estimation of size-factors, the variance stabilization using a parametric fit and a Wald-Test for differnece in log2 fold changes between the untreated and treated data. More information about this can be found in _Love et al._ [Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2](http://www.ncbi.nlm.nih.gov/pubmed/25516281) _Genome Biology_ 2014

1
2
3

stat.DESeq(untreated.list,treated.list,namecolumn=1, fullmatchcolumn=2,
agg.function=sum, extractpattern=expression("^(.+?)_.+"), sorting=FALSE,
sgRNA.pval = 0.01, filename.deseq="data", fitType="parametric", p.adjust="holm")

`untreated.list`	A list of data.frames of untreated, control samples. e.g. list(df.control1, df.control2)
`treated.list`	A list of data.frames of treated samples. e.g. list(df.treated1, df.treated2)
`namecolumn`	In which the target names are located, e.g. namecolumn=1 for the first columns.
`fullmatchcolumn`	Column, in which readcounts are located, e.g. fullmatchcolumn=2 for the second column.
`agg.function`	Function used to aggregate gene data from individual sgRNA data. By default, agg.function=mean, but it can be any other function e.g. sum or median.
`extractpattern`	Regular Expression, used to extract the gene name from the sgRNA name. Please make sure that the gene name extracted is accesible by putting its regular expression in brackets (). The default value expression("^(.+?)_.+") will look for the gene name (.+?) in front of the separator _ and any character afterwards .+ e.g. gene1_anything .
`sorting`	Defines whether the final output is sorted by the calculated p-value. By default, sorting=FALSE will return a table sorted by gene name.
`sgRNA.pval`	p-value threshold to count significant sgRNAs for each gene. Default 0.001 Value (numeric)
`filename.deseq`	Filename of raw DESeq2 data output. Default "data" Values (character)
`fitType`	See '?DESeq2'. Default "parametric" Values "parametric", "local" "mean"
`p.adjust`	Method to adjust p-value for multiple testing. See '?DEseq2'. Default "holm" Values see '?DESeq2'

none

stat.DESeq returns a formal class that contains gene names including the calculated p-value. The returned class can be visualized using carpools.hitident (see ?carpools.hitident). The output is formatted as follows:

log2 fold change (MAP): condition untreated vs treated
Wald test p-value: condition untreated vs treated
DataFrame with 813 rows and 6 columns

	baseMean	log2FoldChange	lfcSE	stat	pvalue	padj
AAK1	73.90565	-0.23319491	0.2927459	-0.7965779	0.42569619	0.7018234
AATK	159.43350	-0.11312924	0.2740927	-0.4127408	0.67979655	0.8514905
ABI1	131.03013	-0.09915855	0.2693971	-0.3680758	0.71281670	0.8691949
ABL1	77.51711	0.07837768	0.3155477	0.2483862	0.80383562	0.9114121
ABL2	119.22621	-0.49412039	0.2846396	-1.7359507	0.08257254	0.3128525
...	...	...	...	...	...	...

none

Jan Winter, DESEq2 was developed by the Wolfgang Huber lab (EMBL, Heidelberg)

data(caRpools)
data.deseq = stat.DESeq(untreated.list = list(CONTROL1, CONTROL2),
  treated.list = list(TREAT1,TREAT2), namecolumn=1,
  fullmatchcolumn=2, extractpattern=expression("^(.+?)(_.+)"),
  sorting=FALSE, filename.deseq = "ANALYSIS-DESeq2-sgRNA.tab",
  fitType="parametric")
  
knitr::kable(data.deseq$genes[1:10,])