call.abnormal.cov: Call abnormal bins
In jmonlong/PopSV: Population-based detection of structural variants from Read-Depth signal

Description Usage Arguments Details Value Author(s)

Detect abnormal bin from the Z-score distribution. A normal distribution is first fitted to the Z-score distribution. P-values are computed from this estimated null distribution and corrected for multiple testing. Eventually consecutive bins with abnormal read counts can be merged.

call.abnormal.cov(files.df, samp, out.pdf = NULL, FDR.th = 0.05,
  merge.cons.bins = c("stitch", "zscores", "cbs", "no"), stitch.dist = NULL,
  max.gap.size = 1e+05, z.th = c("sdest", "consbins", "sdest2N"),
  norm.stats = NULL, min.normal.prop = 0.9, aneu.chrs = NULL,
  gc.df = NULL, sub.z = NULL, outfile.pv = NULL)

`files.df`	a data.frame with the paths to different sample files (bin count, Z-scores, ..). Here columns 'z' and 'fc' are used to retrieve Z-scores and fold changes.
`samp`	the name of the sample to analyze.
`out.pdf`	the name of the output pdf file.
`FDR.th`	the False Discovery Rate to use for the calls.
`merge.cons.bins`	how the bins should be merged. Default is 'stitch'. 'zscores' is another approch (see Details), 'no' means no bin merging.
`stitch.dist`	the maximal distance between two calls to be merged into one (if 'merge.cons.bins="stitch"'). If NULL (default), the bin size + 1 is used.
`max.gap.size`	the maximum gap between bins allowed for CBS. Default is 100 kb. Calls will not span gaps larger than this (e.g. centromere).
`z.th`	how the threshold for abnormal Z-score is chosen. Default is 'sdest' which will use 'FDR.th=' parameter as well. 'consbins' looks at the number of consecutive bins, see Details.
`norm.stats`	the name of the file with the normalization statistics ('norm.stats' in 'tn.norm' function) or directly a 'norm.stats' data.frame.
`min.normal.prop`	the minimum proportion of the regions expected to be normal. Default is 0.9. For cancers with many large aberrations, this number can be lowered. Maximum value accepted is 0.98 .
`aneu.chrs`	the names of the chromosomes to remove because flagged as aneuploid. If NULL (default) all chromosomes are analyzed.
`gc.df`	a data.frame with the GC content in each bin, for the Z-score normalization. Columns required: chr, start, end, GCcontent. If NULL (default), no normalization is performed.
`sub.z`	if non-NULL the number of bins in a sub-segment for Z-score null distribution estimation. Default is NULL. If highly rearranged genomes (cancer), try '1e4'.
`outfile.pv`	if non-NULL, the name of the file to write all the Pvalues (for all bins). Used in some analysis (e.g. annotate.with.parents).

Two approaches can be used to define if a bin has abnormal threshold. By default ('sdest'), the null Normal distribution standard deviation is estimated by sequencially trimming the Z-score distribution and using an estimator for censored values. Once the Z-scores corresponding to the abnormal bins are trimmed out, the estimator reaches a plateau which is used as estimator for the null standard deviation. Using this parameter, P-values and Q-values are computed; abnormal bins are then defined by a user-defined FDR threshold on the Q-values. An alternative approach, 'consbins', looks at the distribution of consecutive bins to define the best threshold on the Z-scores. A wide range of thresholds are eplored. For each threshold, selected bins are stitched together if directly consecutive and the proportion of single and pair bins is computed. With a loose value-many selected bins-, pairs of consecutive bins happen by chance. More stringent values decreases the proportion of pairs and increases the number of single bins until it reaches true calls that are more likely to be consecutive. The Z-score threshold is defined as the changepoint between random and true calls distribution. Eventually another version of 'sdest' is implemented but this time fitting two Gaussian distribution (centered in 0). This approach, 'sdest2N', is more suited when we suspect that the sample tested is not completely comparable to the reference samples. With the two Gaussian distribution a longer tail can be integrated in the null distribution, reducing the potential false calls in presence of a long-tail.

Two approaches are available to merge bins with abnormal read coverage. 'stitch' simply stitches bins passing a user-defined significance threshold. In this approach, the stitching distance specifies the maximum distance between two bins that will be merged. By default the bin size is used, i.e. two abnormal bins will be merged if separated by maximum one bin. 'zscores' approach looks at the Z-score of two consecutive bins: if the minimum(maximum) is significantly higher(lower) than a simulated null distribution, these two bins will be merged to create a larger duplication(deletion).

For cancer samples, 'min.normal.prop' can be reduced, e.g. to 0.6. Aneuploid can also be removed with 'aneu.chrs'. Function 'aneuploidy.flag' can help flagging aneuploid chromosomes.

a data.frame with columns

`chr, start, end`	the genomic region definition.
`z`	the Z-score.
`pv, qv`	the P-value and Q-value(~FDR).
`fc`	the copy number estimate (if 'fc' was not NULL).
`nb.bin.cons`	the number of consecutive bins (if the bins were merged, i.e. ' 'merge.cons.bins!='no”).
`cn2.dev`	Copy number deviation from the reference.

Jean Monlong

jmonlong/PopSV documentation built on Sept. 15, 2019, 9:29 p.m.

jmonlong/PopSV index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

jmonlong/PopSV
Population-based detection of structural variants from Read-Depth signal

call.abnormal.cov: Call abnormal bins
In jmonlong/PopSV: Population-based detection of structural variants from Read-Depth signal

Description

Usage

Arguments

Details

Value

Author(s)

Related to call.abnormal.cov in jmonlong/PopSV...

R Package Documentation

Browse R Packages

We want your feedback!

jmonlong/PopSV Population-based detection of structural variants from Read-Depth signal

call.abnormal.cov: Call abnormal bins In jmonlong/PopSV: Population-based detection of structural variants from Read-Depth signal

Description

Usage

Arguments

Details

Value

Author(s)

Related to call.abnormal.cov in jmonlong/PopSV...

R Package Documentation

Browse R Packages

We want your feedback!

jmonlong/PopSV
Population-based detection of structural variants from Read-Depth signal

call.abnormal.cov: Call abnormal bins
In jmonlong/PopSV: Population-based detection of structural variants from Read-Depth signal