ccr.cleanChrm | R Documentation |
This function applies a circular binary segmentation algorithm [1, 2] to genomic-sorted log fold changes of all the sgRNAs targeting genes on the same chromosome. This procedure yields a sets of genomic regions of estimated equal sgRNAs' log fold changes, significantly differing on average from adjacent regions. If some of these regions fulfill certain criteria (detailed below) then they are deemed as responding to CRISPR-Cas9 targeting in a gene independent manner, i.e. they might be biased by local feature of the DNA) and their pattern of log fold changes is mean centered [3].
ccr.cleanChrm(gwSortedFCs,CHR,display=TRUE,label='',
saveTO=NULL,min.ngenes=3,ignoredGenes=NULL,
capped = FALSE,corrMet = 'mean',alpha = 0.01,
nperm = 10000,p.method ="hybrid",min.width=2,
kmax=25,nmin=200,eta=0.05,trim = 0.025,
undo.splits = "none",undo.prune=0.05,
undo.SD=3)
gwSortedFCs |
A data frame containing genome-wide genomic-sorted sgRNAs' log fold changes. This data frame must include one named row per each sgRNAs and the following columns/headers:
This can be generated using the |
CHR |
Numerical value indicating the chromosome to analyse and correct. X and Y chromosome must be indicated with 23 and 24, respectively. |
display |
A logical value indicating whether genomic plots showing the results of the biased regions' identification and their log fold change correction should be generated or not. |
label |
A string indicating the experiment name, used in the main title of the plots and for the name of the folder where results are saved. |
saveTO |
If different from NULL then it will contain the path where pdf files with then genomic plots showing the results of the biased regions' identification (and their log fold change correction) will be saved (within a folder named as defined in the |
min.ngenes |
A numerical value (>0) specifying the minimal number of different genes that the set of sgRNAs within a region of estimated equal log fold changes should target in order for that region to be corrected, i.e. mean centered. |
ignoredGenes |
A vector of strings containing HGNC symbols of genes that should not be considered when computing the minimal number of different genes targeted by the sgRNAs in the same identified region of estimated equal log fold changes. This vector could contain, for example, a priori known essential genes. This parameter should be set to NULL (default value) for a completely unsupervised correction. |
capped |
Boolean argument that if TRUE prevents the sgRNAs changing the sign of their logFC due to the correction, by capping corresponding values to 0. By default is FALSE. |
corrMet |
String specifying the correction to be applied, if equal to 'mean' (its default value) than the mean of the sgRNA logFC in a biased segment is subtracted to the logFCs of all the sgRNA in the same biased segment. If different from 'mean' then the median of the sgRNA logFC in a biased segment is subtracted to the logFCs of all the sgRNA in the same biased segment. |
alpha |
significance levels for the test to accept change-points (see DNAcopy). |
nperm |
number of permutations used for p-value computation (see DNAcopy). |
p.method |
method used for p-value computation. For the "perm" method the p-value is based on full permutation. For the "hybrid" method the maximum over the entire region is split into maximum of max over small segments and max over the rest. Approximation is used for the larger segment max. Default is hybrid (see DNAcopy). |
min.width |
the minimum number of markers for a changed segment. The default is 2 but can be made larger. Maximum possible value is set at 5 since arbitrary widths can have the undesirable effect of incorrect change-points when a true signal of narrow widths exists (see DNAcopy). |
kmax |
the maximum width of smaller segment for permutation in the hybrid method (see DNAcopy). |
nmin |
the minimum length of data for which the approximation of maximum statistic is used under the hybrid method. should be larger than 4*kmax (see DNAcopy). |
eta |
the probability to declare a change conditioned on the permuted statistic exceeding the observed statistic exactly j (= 1,...,nperm*alpha) times. (see DNAcopy). |
trim |
proportion of data to be trimmed for variance calculation for smoothing outliers and undoing splits based on SD (see DNAcopy). |
undo.splits |
a character string specifying how change-points are to be undone, if at all. Default is "none". Other choices are "prune", which uses a sum of squares criterion, and "sdundo", which undoes splits that are not at least this many SDs apart. (see DNAcopy). |
undo.prune |
the proportional increase in sum of squares allowed when eliminating splits if undo.splits="prune" (see DNAcopy). |
undo.SD |
the number of SDs between means to keep a split if undo.splits="sdundo" (see DNAcopy). |
The rest of the arguments are passed to the segment
function of the DNAcopy
package as they are.
A list containing two data frames. The first one (correctedFCs) contains a named row per each sgRNA and the following columns/header:
CHR
: the chromosome of the gene targeted by the sgRNA under consideration;
startp
: the genomic coordinate of the starting position of the region targeted by the sgRNA under consideration;
endp
: the genomic coordinate of the ending position of the region targeted by the sgRNA under consideration;
genes
: the HGNC symbol of the gene targeted by the sgRNA under consideration;
avgFC
: the log fold change of the sgRNA averaged across replicates;
correction
: the type of correction: 1 = increased, -1 = decreased;
correctedFC
: the corrected log fold change of the sgRNA
.
The second one (regions) contains the identified region of estimated equal log fold changes (one region per row) and the following columns/headers:
CHR
: the chromosome of the region under consideration;
startp
: the genomic coordinate of the starting position of the region under consideration;
endp
: the genomic coordinate of the ending position of the region under consideration;
n.sgRNAs
: the number of sgRNAs targeting sequences in the region under consideration;
avg.logFC
: the average log fold change of the sgRNAs targeting the region;
guideIdx
: the indexes range of the sgRNAs targeting the region under consideration as they appear in the gwSortedF Cs provided in input.
Francesco Iorio (francesco.iorio@fht.org)
[1] Olshen, A. B., Venkatraman, E. S., Lucito, R., Wigler, M. (2004). Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics 5: 557-572.
[2] Venkatraman, E. S., Olshen, A. B. (2007). A faster circular binary segmentation algorithm for the analysis of array CGH data. Bioinformatics 23: 657-63.
[3] Iorio, F., Behan, F. M., Goncalves, E., Beaver, C., Ansari, R., Pooley, R., et al. (n.d.).
Unsupervised correction of gene-independent cell responses to CRISPR-Cas9 targeting.
http://doi.org/10.1101/228189
ccr.logFCs2chromPos
, ccr.NormfoldChanges
## Not run:
data(KY_Library_v1.0)
fn<-paste(system.file('extdata', package = 'CRISPRcleanR'),'/HT-29_counts.tsv',sep='')
normANDfcs<-ccr.NormfoldChanges(fn,min_reads=30,EXPname='Example',
libraryAnnotation=KY_Library_v1.0)
gwSortedFCs<-ccr.logFCs2chromPos(normANDfcs$logFCs,KY_Library_v1.0)
chr8cleaned<-ccr.cleanChrm(gwSortedFCs,8,display=TRUE,label='HT-29',
min.ngenes=3)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.