ccr.perf_statTests: CRISPRcleanR correction assessment: Statistical tests
In francescojm/CRISPRcleanR: Unsupervised correction of gene independent cell responses to CRISPR-cas9 targeting

ccr.perf_statTests

R Documentation

CRISPRcleanR correction assessment: Statistical tests

Description

This function tests the log fold changes of sgRNAs targeting different sets of genes for statistically significant differences with respect to background pre and post CRISPRcleanR correction, creating two sets of boxplots with outcomes and outputting statistical indicators.

Usage

ccr.perf_statTests(cellLine, libraryAnnotation, correctedFCs,
                   outDir = "./",
                   GDSC.geneLevCNA = NULL,
                   CCLE.gisticCNA = NULL,
                   RNAseq.fpkms = NULL,
                   GDSC.CL_annotation=NULL,
                   verbose = c(-1, 0, 1))

Arguments

`cellLine`	A string specifying the name of a cell line (or a COSMIC identifier [1]);
`libraryAnnotation`	The sgRNA library annotations formatted as specified in the reference manual entry of the `KY_Library_v1.0` built in library.
`correctedFCs`	sgRNAs log fold changes corrected for gene independent responses to CRISPR-Cas9 targeting, generated with the function `ccr.GWclean` (first data frame included in the list outputted by `ccr.GWclean`, i.e. `corrected_logFCs`).
`outDir`	The path of the folder where the boxplot will be saved.
`GDSC.geneLevCNA`	Genome-wide copy number data with the same format of `GDSC.geneLevCNA`. This can be assembled from the xls sheet specified in the source section [a] (containing data for the GDSC1000 cell lines). If NULL, then this function uses the built in `GDSC.geneLevCNA` data frame, containing data derived from [a] for 15 cell lines used in [2] to assess the performances of CRISPRcleanR.
`CCLE.gisticCNA`	Genome-wide Gistic [3] scores quantifying copy number status across cell lines with the same format of `CCLE.gisticCNA`. If NULL then this function uses the `CCLE.gisticCNA` builtin data frame, containing data for 13 cell lines of the 15 used in [2] to assess the performances of CRISPRcleanR.
`RNAseq.fpkms`	Genome-wide substitute reads with fragments per kilobase of exon per million reads mapped (FPKM) across cell lines. These can be derived from a comprehensive collection of RNAseq profiles described in [4]. The format must be the same of the `RNAseq.fpkms` builtin data frame. If NULL then this function uses the `RNAseq.fpkms` builtin data fram containing data for 15 cell lines used in [2] to assess CRISPRcleaneR results.
`GDSC.CL_annotation`	Cell lines annotation dataframe with the same structure of the `GDSC.CL_annotation`. If NULL then the `GDSC.CL_annotation` is used.
`verbose`	Numeric value. In determine the details in the level of details in the messages displeyed running the function: -1 suppres all the messages, 0 display a minimal set of messages, 1 dsiplay all messages (default).

Details

This functions assess the statistical difference pre/post CRISPRcleanR correction of log fold changes for sgRNAs targeting respectively:

copy number (CN) deleted genes according to the GDSC1000 repository
CN deleted genes (gistic score = -2) according to the CCLE repository
non expressed genes (FPKM < 0.05)
genes with gistic score = 1
genes with gistic score = 2
non espressed genes (FPKM < 0.05) with gistic score = 1
non espressed genes (FPKM < 0.05) with gistic score = 2
genes with minimal CN = 2, according to the GDSC1000
genes with minimal CN = 4, according to the GDSC1000
genes with minimal CN = 8, according to the GDSC1000
genes with minimal CN = 10, according to the GDSC1000
non expressed genes (FPKM < 0.05) with minimal CN = 2, according to the GDSC1000
non expressed genes (FPKM < 0.05) with minimal CN = 4, according to the GDSC1000
non expressed genes (FPKM < 0.05) with minimal CN = 8, according to the GDSC1000
non expressed genes (FPKM < 0.05) with minimal CN = 10, according to the GDSC1000
core fitness essential genes, assembling signatures from MsigDB [5], included in the builtin vectors EssGenes.DNA_REPLICATION_cons, EssGenes.KEGG_rna_polymerase,
EssGenes.PROTEASOME_cons, EssGenes.ribosomalProteins,
EssGenes.SPLICEOSOME_cons
Reference core fitness essential genes assembled from multiple RNAi studies used as classification template by the BAGEL algorithm to call gene depletion significance [6]
(BAGEL_essential)
Reference core fitness essential genes assembled from multiple RNAi studies used as classification template by the BAGEL algorithm to call gene depletion significance [6] after the removal core fitness essential genes from MsigDB [5]
Reference non essential genes assembled from multiple RNAi studies used as classification template by the BAGEL algorithm to call gene depletion significance [6]
(BAGEL_nonEssential)

Value

A list of three named 2x19 matrices, with one entry per statistical test, rows indicating pre/post CRISPRcleanR correction sgRNAs' log fold changes and one column per each tested gene set. In each matrix the entries contains, respectively

`PVALS`	Pvalue resulting from a Student's t-test assessing the differences between sgRNAs log fold changes pre (first row) and post (second row) CRISPRcleanR correction with respect to background
`SIGNS`	The sign of the difference (1 = mean log fold change of the tested set larger that the mean of the background population, -1 = mean log fold change of the tested set smaller than the mean of the background population)
`EFFsizes`	Effect size (computing via the Cohen's D): difference of the means / pooled standard deviation.

Author(s)

Francesco Iorio (francesco.iorio@fht.org)

Source

[a] ftp://ftp.sanger.ac.uk/pub/project/cancerrxgene/releases/release-6.0/Gene_level_CN.xlsx.

References

[1] Forbes SA, Beare D, Boutselakis H, et al. COSMIC: somatic cancer genetics at high-resolution Nucleic Acids Research, Volume 45, Issue D1, 4 January 2017, Pages D777-D783.

[2] Iorio, F., Behan, F. M., Goncalves, E., Beaver, C., Ansari, R., Pooley, R., et al. (n.d.). Unsupervised correction of gene-independent cell responses to CRISPR-Cas9 targeting.
http://doi.org/10.1101/228189

[3] Mermel CH, Schumacher SE, Hill B, et al. GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers. Genome Biol. 2011;12(4):R41. doi: 10.1186/gb-2011-12-4-r41.

[4] Garcia-Alonso L, Iorio F, Matchan A, et al. Transcription factor activities enhance markers of drug response in cancer doi: https://doi.org/10.1101/129478

[5] Subramanian, A., Tamayo, P., Mootha, V. K., Mukherjee, S., Ebert, B. L., Gillette, M. A., et al. (2005). Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America, 102(43), 15545-15550. http://doi.org/10.1073/pnas.0506580102

[6] BAGEL: a computational framework for identifying essential genes from pooled library screens. Traver Hart and Jason Moffat. BMC Bioinformatics, 2016 vol. 17 p. 164.

Examples

## Not run: 
## loading corrected sgRNAs log fold-changes and segment annotations for an example
## cell line (EPLC-272H)
data(EPLC.272HcorrectedFCs)

## loading library annotation
data(KY_Library_v1.0)

## Evaluate correction effects. Boxplots will be saved in EPLC-272H.pdf
## in the current directory
RES<-ccr.perf_statTests('EPLC-272H',libraryAnnotation = KY_Library_v1.0,
                   correctedFCs = EPLC.272HcorrectedFCs$corrected_logFCs)
RES$PVALS
RES$EFFsizes

## End(Not run)

francescojm/CRISPRcleanR documentation built on April 30, 2023, 5:41 a.m.