ccr.perf_statTests | R Documentation |
This function tests the log fold changes of sgRNAs targeting different sets of genes for statistically significant differences with respect to background pre and post CRISPRcleanR correction, creating two sets of boxplots with outcomes and outputting statistical indicators.
ccr.perf_statTests(cellLine, libraryAnnotation, correctedFCs,
outDir = "./",
GDSC.geneLevCNA = NULL,
CCLE.gisticCNA = NULL,
RNAseq.fpkms = NULL,
GDSC.CL_annotation=NULL,
verbose = c(-1, 0, 1))
cellLine |
A string specifying the name of a cell line (or a COSMIC identifier [1]); |
libraryAnnotation |
The sgRNA library annotations formatted as specified in the reference manual entry of the |
correctedFCs |
sgRNAs log fold changes corrected for gene independent responses to CRISPR-Cas9 targeting, generated with the function |
outDir |
The path of the folder where the boxplot will be saved. |
GDSC.geneLevCNA |
Genome-wide copy number data with the same format of |
CCLE.gisticCNA |
Genome-wide Gistic [3] scores quantifying copy number status across cell lines with the same format of |
RNAseq.fpkms |
Genome-wide substitute reads with fragments per kilobase of exon per million reads mapped (FPKM) across cell lines. These can be derived from a comprehensive collection of RNAseq profiles described in [4]. The format must be the same of the |
GDSC.CL_annotation |
Cell lines annotation dataframe with the same structure of the |
verbose |
Numeric value. In determine the details in the level of details in the messages displeyed running the function: -1 suppres all the messages, 0 display a minimal set of messages, 1 dsiplay all messages (default). |
This functions assess the statistical difference pre/post CRISPRcleanR correction of log fold changes for sgRNAs targeting respectively:
copy number (CN) deleted genes according to the GDSC1000 repository
CN deleted genes (gistic score = -2) according to the CCLE repository
non expressed genes (FPKM < 0.05)
genes with gistic score = 1
genes with gistic score = 2
non espressed genes (FPKM < 0.05) with gistic score = 1
non espressed genes (FPKM < 0.05) with gistic score = 2
genes with minimal CN = 2, according to the GDSC1000
genes with minimal CN = 4, according to the GDSC1000
genes with minimal CN = 8, according to the GDSC1000
genes with minimal CN = 10, according to the GDSC1000
non expressed genes (FPKM < 0.05) with minimal CN = 2, according to the GDSC1000
non expressed genes (FPKM < 0.05) with minimal CN = 4, according to the GDSC1000
non expressed genes (FPKM < 0.05) with minimal CN = 8, according to the GDSC1000
non expressed genes (FPKM < 0.05) with minimal CN = 10, according to the GDSC1000
core fitness essential genes, assembling signatures from MsigDB [5], included in the builtin vectors EssGenes.DNA_REPLICATION_cons
, EssGenes.KEGG_rna_polymerase
,
EssGenes.PROTEASOME_cons
, EssGenes.ribosomalProteins
,
EssGenes.SPLICEOSOME_cons
Reference core fitness essential genes assembled from multiple RNAi studies used as classification template by the BAGEL algorithm to call gene depletion significance [6]
(BAGEL_essential
)
Reference core fitness essential genes assembled from multiple RNAi studies used as classification template by the BAGEL algorithm to call gene depletion significance [6] after the removal core fitness essential genes from MsigDB [5]
Reference non essential genes assembled from multiple RNAi studies used as classification template by the BAGEL algorithm to call gene depletion significance [6]
(BAGEL_nonEssential
)
A list of three named 2x19 matrices, with one entry per statistical test, rows indicating pre/post CRISPRcleanR correction sgRNAs' log fold changes and one column per each tested gene set. In each matrix the entries contains, respectively
PVALS |
Pvalue resulting from a Student's t-test assessing the differences between sgRNAs log fold changes pre (first row) and post (second row) CRISPRcleanR correction with respect to background |
SIGNS |
The sign of the difference (1 = mean log fold change of the tested set larger that the mean of the background population, -1 = mean log fold change of the tested set smaller than the mean of the background population) |
EFFsizes |
Effect size (computing via the Cohen's D): difference of the means / pooled standard deviation. |
Francesco Iorio (francesco.iorio@fht.org)
[a] ftp://ftp.sanger.ac.uk/pub/project/cancerrxgene/releases/release-6.0/Gene_level_CN.xlsx.
[1] Forbes SA, Beare D, Boutselakis H, et al. COSMIC: somatic cancer genetics at high-resolution Nucleic Acids Research, Volume 45, Issue D1, 4 January 2017, Pages D777-D783.
[2] Iorio, F., Behan, F. M., Goncalves, E., Beaver, C., Ansari, R., Pooley, R., et al. (n.d.). Unsupervised correction of gene-independent cell responses to CRISPR-Cas9 targeting.
http://doi.org/10.1101/228189
[3] Mermel CH, Schumacher SE, Hill B, et al. GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers. Genome Biol. 2011;12(4):R41. doi: 10.1186/gb-2011-12-4-r41.
[4] Garcia-Alonso L, Iorio F, Matchan A, et al. Transcription factor activities enhance markers of drug response in cancer doi: https://doi.org/10.1101/129478
[5] Subramanian, A., Tamayo, P., Mootha, V. K., Mukherjee, S., Ebert, B. L., Gillette, M. A., et al. (2005). Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America, 102(43), 15545-15550. http://doi.org/10.1073/pnas.0506580102
[6] BAGEL: a computational framework for identifying essential genes from pooled library screens. Traver Hart and Jason Moffat. BMC Bioinformatics, 2016 vol. 17 p. 164.
KY_Library_v1.0
, ccr.GWclean
,
GDSC.geneLevCNA
, CCLE.gisticCNA
, RNAseq.fpkms
,
EssGenes.DNA_REPLICATION_cons
, EssGenes.KEGG_rna_polymerase
, EssGenes.PROTEASOME_cons
, EssGenes.ribosomalProteins
, EssGenes.SPLICEOSOME_cons
BAGEL_essential
, BAGEL_nonEssential
## Not run:
## loading corrected sgRNAs log fold-changes and segment annotations for an example
## cell line (EPLC-272H)
data(EPLC.272HcorrectedFCs)
## loading library annotation
data(KY_Library_v1.0)
## Evaluate correction effects. Boxplots will be saved in EPLC-272H.pdf
## in the current directory
RES<-ccr.perf_statTests('EPLC-272H',libraryAnnotation = KY_Library_v1.0,
correctedFCs = EPLC.272HcorrectedFCs$corrected_logFCs)
RES$PVALS
RES$EFFsizes
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.