ccr.perf_distributions: CRISPRcleanR correction assessment: inspection of sgRNA log...

View source: R/CRISPRcleanR.R

ccr.perf_distributionsR Documentation

CRISPRcleanR correction assessment: inspection of sgRNA log fold changes distributions

Description

This function creates distributions density plots of sgRNA log fold changes for defined sets of targeted genes prior/post CRISPRcleanR correction.

Usage

ccr.perf_distributions(cellLine, correctedFCs,
                       GDSC.geneLevCNA = NULL,
                       CCLE.gisticCNA = NULL,
                       RNAseq.fpkms = NULL,
                       minCNs = c(8, 10),
                       libraryAnnotation,
                       GDSC.CL_annotation=NULL)

Arguments

cellLine

A string specifying the name of a cell line (or a COSMIC identifier [1]);

correctedFCs

sgRNAs log fold changes corrected for gene independent responses to CRISPR-Cas9 targeting, generated with the function ccr.GWclean (first data frame included in the list outputted by ccr.GWclean, i.e. corrected_logFCs).

GDSC.geneLevCNA

Genome-wide copy number data with the same format of GDSC.geneLevCNA. This can be assembled from the xls sheet specified in the source section [a] (containing data for the GDSC1000 cell lines). If NULL, then this function uses the built in GDSC.geneLevCNA data frame, containing data derived from [a] for 15 cell lines used in [2] to assess the performances of CRISPRcleanR.

CCLE.gisticCNA

Genome-wide Gistic [3] scores quantifying copy number status across cell lines with the same format of CCLE.gisticCNA. If NULL then this function uses the CCLE.gisticCNA builtin data frame, containing data for 13 cell lines of the 15 used in [2] to assess the performances of CRISPRcleanR.

RNAseq.fpkms

Genome-wide substitute reads with fragments per kilobase of exon per million reads mapped (FPKM) across cell lines. These can be derived from a comprehensive collection of RNAseq profiles described in [4]. The format must be the same of the RNAseq.fpkms builtin data frame. If NULL then this function uses the RNAseq.fpkms builtin data fram containing data for 15 cell lines used in [2] to assess CRISPRcleaneR results.

minCNs

A numerical vector with two entries specifying the minimal copy number for a gene in order to be considered amplified based on the data in GDSC.geneLevCNA. These two values can be 2, 4, 8 or 10.

libraryAnnotation

The sgRNA library annotations formatted as specified in the reference manual entry of the KY_Library_v1.0 built in library.

GDSC.CL_annotation

Cell lines annotation dataframe with the same structure of the GDSC.CL_annotation. If NULL then the GDSC.CL_annotation is used.

Details

This function generates 4 sets of plots. They contains log fold change distributions density plots prior/post CRISPRcleanR correction respectively for

  • (i) Copy number amplified genes according to the data in GDSC.geneLevCNA based on the two threshold values specified in minCNs;

  • (ii) Copy number amplified genes according to the data in CCLE.gisticCNA (gistic score = +2);

  • (iii) Copy number amplified non expressed genes according to the data in GDSC.geneLevCNA based on the two threshold values specified in minCNs, and the data in RNAseq.fpkms (FPKM < 0.05);

  • (iv) reference sets of core fitness essential genes from MSigDB [5] (included in the builtin vectors EssGenes.DNA_REPLICATION_cons, EssGenes.KEGG_rna_polymerase,
    EssGenes.PROTEASOME_cons, EssGenes.ribosomalProteins,
    EssGenes.SPLICEOSOME_cons, and reference core-fitness-essential and non-essential genes assembled from multiple RNAi studies used as classification template by the BAGEL algorithm to call gene depletion significance [6]
    (BAGEL_essential, BAGEL_nonEssential).

Author(s)

Francesco Iorio (francesco.iorio@fht.org)

Source

[a] ftp://ftp.sanger.ac.uk/pub/project/cancerrxgene/releases/release-6.0/Gene_level_CN.xlsx.

References

[1] Forbes SA, Beare D, Boutselakis H, et al. COSMIC: somatic cancer genetics at high-resolution Nucleic Acids Research, Volume 45, Issue D1, 4 January 2017, Pages D777-D783.

[2] Iorio, F., Behan, F. M., Goncalves, E., Beaver, C., Ansari, R., Pooley, R., et al. (n.d.). Unsupervised correction of gene-independent cell responses to CRISPR-Cas9 targeting.
http://doi.org/10.1101/228189

[3] Mermel CH, Schumacher SE, Hill B, et al. GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers. Genome Biol. 2011;12(4):R41. doi: 10.1186/gb-2011-12-4-r41.

[4] Garcia-Alonso L, Iorio F, Matchan A, et al. Transcription factor activities enhance markers of drug response in cancer doi: https://doi.org/10.1101/129478

[5] Subramanian, A., Tamayo, P., Mootha, V. K., Mukherjee, S., Ebert, B. L., Gillette, M. A., et al. (2005). Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America, 102(43), 15545-15550. http://doi.org/10.1073/pnas.0506580102

[6] BAGEL: a computational framework for identifying essential genes from pooled library screens. Traver Hart and Jason Moffat. BMC Bioinformatics, 2016 vol. 17 p. 164.

See Also

KY_Library_v1.0, ccr.GWclean,
GDSC.geneLevCNA, CCLE.gisticCNA, RNAseq.fpkms,
EssGenes.DNA_REPLICATION_cons, EssGenes.KEGG_rna_polymerase, EssGenes.PROTEASOME_cons, EssGenes.ribosomalProteins, EssGenes.SPLICEOSOME_cons
BAGEL_essential, BAGEL_nonEssential

Examples

## Not run: 
## loading corrected sgRNAs log fold-changes and segment annotations for an example
## cell line (HT-29)
data(HT.29correctedFCs)

## loading library annotation
data(KY_Library_v1.0)

## inpecting sgRNA log fold change distributions prior/post CRISPRcleanR correction
ccr.perf_distributions('HT-29',HT.29correctedFCs$corrected_logFCs,
                       libraryAnnotation = KY_Library_v1.0)


## End(Not run)

francescojm/CRISPRcleanR documentation built on April 30, 2023, 5:41 a.m.