ccr.GWclean | R Documentation |
This function takes in input a genome-wide essentiality profile derived from a CRISPR-Cas9 experiment employing a pooled library of single guide RNAs (sgRNAs) targeting protein coding genes, which are transfected in an in vitro model stably expressing Cas9.
The essentiality profile quantifies the loss/gain-of-fitness caused by each sgRNA-targeting, and it is expressed as log fold changes (logFCs) between the
aboundance of the sgRNAs at an end point after cell purification and their aboundance in the plasmid pool used for viral production, or at an initial time point, or in any other control condition. A circular binary segmentation algorithm [1, 2] is applied by this function to the genome-wide pattern of logFCs provided in input, in order to identify genomic regions including sgRNAs with sufficiently equal logFC (and mean logFC sufficiently different from background) and targeting a minimal number of different genes.
Assuming that it is very unlikely to observe the same loss/gain-of-fitness effect when targeting a large number of contiguous genes, if certain user-defined condition (detailed below) are met then the logFCs of such regions are deemed as biased by some local feature of the involved genomic segment (which could be, for example, copy number amplified [3]), and they are corrected, i.e. mean centered [4].
ccr.GWclean(gwSortedFCs,label='',display=TRUE,
saveTO=NULL,ignoredGenes=NULL,min.ngenes=3,
alpha = 0.01,
nperm = 10000,
p.method ="hybrid",
min.width=2,
kmax=25,
nmin=200,
eta=0.05,
trim = 0.025,
undo.splits = "none",
undo.prune=0.05,
undo.SD=3)
gwSortedFCs |
A data frame containing genome-wide genomic-sorted sgRNAs' log fold changes. This data frame must include one named row per each sgRNA and the following columns/headers:
This can be generated using the |
label |
A string indicating the experiment name. This is used to compose the main title of the plots generated by this function and the name of the folder where the results are saved. |
display |
A logical value indicating whether genomic plots showing the results of the biased regions' identification and their log fold change correction should be generated or not. |
saveTO |
If different from NULL then this parameter will contain the path where pdf files with then genomic plots showing the results of the biased regions' identification (and their log fold change correction) will be saved (within a folder named as defined in the |
ignoredGenes |
A vector of strings containing HGNC symbols of genes that should not be considered when computing the minimal number of different genes targeted by sgRNAs in the same identified region of estimated equal log fold changes. This could contain, for example, a-priori known essential genes. |
min.ngenes |
A numerical value (>0) specifying the minimal number of different genes that the set of sgRNAs within a region of estimated equal logFCs should target in order for theri logFCs to be corrected, i.e. mean centered. |
alpha |
significance levels for the test to accept change-points (see DNAcopy). |
nperm |
number of permutations used for p-value computation (see DNAcopy). |
p.method |
method used for p-value computation. For the "perm" method the p-value is based on full permutation. For the "hybrid" method the maximum over the entire region is split into maximum of max over small segments and max over the rest. Approximation is used for the larger segment max. Default is hybrid (see DNAcopy). |
min.width |
the minimum number of markers for a changed segment. The default is 2 but can be made larger. Maximum possible value is set at 5 since arbitrary widths can have the undesirable effect of incorrect change-points when a true signal of narrow widths exists (see DNAcopy). |
kmax |
the maximum width of smaller segment for permutation in the hybrid method (see DNAcopy). |
nmin |
the minimum length of data for which the approximation of maximum statistic is used under the hybrid method. should be larger than 4*kmax (see DNAcopy). |
eta |
the probability to declare a change conditioned on the permuted statistic exceeding the observed statistic exactly j (= 1,...,nperm*alpha) times. (see DNAcopy). |
trim |
proportion of data to be trimmed for variance calculation for smoothing outliers and undoing splits based on SD (see DNAcopy). |
undo.splits |
a character string specifying how change-points are to be undone, if at all. Default is "none". Other choices are "prune", which uses a sum of squares criterion, and "sdundo", which undoes splits that are not at least this many SDs apart. (see DNAcopy). |
undo.prune |
the proportional increase in sum of squares allowed when eliminating splits if undo.splits="prune" (see DNAcopy). |
undo.SD |
the number of SDs between means to keep a split if undo.splits="sdundo" (see DNAcopy). |
The rest of the arguments are passed to the segment
function of the DNAcopy
package as they are.
A list containing two data frames and a vector of strings. The first data frame (corrected_logFCs) contains a named row per each sgRNA and the following columns/header:
CHR
: the chromosome of the gene targeted by the sgRNA under consideration;
startp
: the genomic coordinate of the starting position of the region targeted by the sgRNA under consideration;
endp
: the genomic coordinate of the ending position of the region targeted by the sgRNA under consideration;
genes
: the HGNC symbol of the gene targeted by the sgRNA under consideration;
avgFC
: the log fold change of the sgRNA averaged across replicates;
correction
: the type of correction: 1 = increased log fold change, -1 = decreased log fold change. 0 indicates no correction;
correctedFC
: the corrected log fold change of the sgRNA
.
The second data frame (segments) contains the identified region of estimated equal log fold changes (one region per row) and the following
columns/headers:
CHR
: the chromosome of the region under consideration;
startp
: the genomic coordinate of the starting position of the region under consideration;
endp
: the genomic coordinate of the ending position of the region under consideration;
n.sgRNAs
: the number of sgRNAs targeting sequences in the region under consideration;
avg.logFC
: the average log fold change of the sgRNAs in the region;
guideIdx
: the indexes range of the sgRNAs targeting the region under consideration as they appear in the gwSortedF Cs provided in input.
The string of vectors (SORTED_sgRNAs) contains the sgRNAs' identifiers in the same order as they are reported in the gwSortedFCs input data frame, i.e. genome sorted.
Francesco Iorio (francesco.iorio@fht.org)
[1] Olshen, A. B., Venkatraman, E. S., Lucito, R., Wigler, M. (2004). Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics 5: 557-572. \
[2] Venkatraman, E. S., Olshen, A. B. (2007). A faster circular binary segmentation algorithm for the analysis of array CGH data. Bioinformatics 23: 657-63. \
[3] Andrew J. Aguirre, Robin M. Meyers, Barbara A. Weir, Francisca Vazquez, Cheng-Zhong Zhang, Uri Ben-David, April Cook, Gavin Ha, William F. Harrington, Mihir B. Doshi, Maria Kost-Alimova, Stanley Gill, Han Xu, Levi D. Ali, Guozhi Jiang, Sasha Pantel, Yenarae Lee, Amy Goodale, Andrew D. Cherniack, Coyin Oh, Gregory Kryukov, Glenn S. Cowley, Levi A. Garraway, Kimberly Stegmaier, Charles W. Roberts, Todd R. Golub, Matthew Meyerson, David E. Root, Aviad Tsherniak and William C. Hahn. Genomic copy number dictates a gene-independent cell response to CRISPR-Cas9 targeting. Cancer Discov June 3 2016 DOI: 10.1158/2159-8290.CD-16-0154
[4] Iorio, F., Behan, F. M., Goncalves, E., Beaver, C., Ansari, R., Pooley, R., et al. (n.d.). Unsupervised correction of gene-independent cell responses to CRISPR-Cas9 targeting.
http://doi.org/10.1101/228189
ccr.cleanChrm
## Not run:
## Loading sgRNA library annotation file
data(KY_Library_v1.0)
## Deriving the path of the file with the example dataset,
## from the mutagenesis of the HT-29 colorectal cancer cell line
fn<-paste(system.file('extdata', package = 'CRISPRcleanR'),'/HT-29_counts.tsv',sep='')
## Loading, median-normalizing and computing fold-changes for the example dataset
normANDfcs<-ccr.NormfoldChanges(fn,min_reads=30,EXPname='HT-29',
libraryAnnotation = KY_Library_v1.0)
## Genome-sorting of the fold changes
gwSortedFCs<-ccr.logFCs2chromPos(normANDfcs$logFCs,KY_Library_v1.0)
## Identifying and correcting biased sgRNAs' fold changes
correctedFCs<-ccr.GWclean(gwSortedFCs,display=TRUE,label='HT-29')
## Visualising first five entries of the corrected fold changes
head(correctedFCs$corrected_logFCs)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.