View source: R/analyze_peaks.R
analyze_peaks | R Documentation |
CNAqc uses peak-detection algorithms to QC data; all leverage the idea that VAFs peaks are known for mutations mapped to a segment with given minor/ major allele copies. CNAqc therefore computes expected peaks, and compares them to peaks detected from data. The theory works with minor modifications for both clonal and subclonal segments. Three distinct algorithms are available, each one working with a different type of copy number segment; all analyses are called by this function, which takes care of running all the suitable algorithms based on the input data.
* Simple clonal segments (1:0, 2:0, 1:1, 2:1, 2:2). This QC measures an error for the precision of the current purity estimate, failing a whole sample or a subset of segments the value is over a desired maximum value. The error is determined as a linear combination from the distance between VAF peaks and their theoretical expectation. For this analysis, all mutations mapping across any segment with the same major/minor alleles are pooled. Note that this score can be used to select among alternative copy number solutions, i.e., favouring a solution with lower score. The peaks are determined i) via peak-detection algorithms from the peakPick package, applied to a Gaussian kernel density estimate (gKDE) smooth of the VAF distribution, and ii) via the Bmix Binomial mixture model. Peak-matching (i.e., determining what data peak is closest to the expected peak) has two possible implementations: one matching the closest peaks by euclidean distance, the other ranking peaks from higher to lowr VAFs, and prioritising the former.
* Complex clonal segments. The QC procedure for these “general” segments uses only the KDE and, as for simple segments, pools all mutations mapping across any segment with the same major/minor alleles. In this case,t no segment-level or sample-level scores are produced, and complex segment with many matched peaks is likely to be correct.
* Subclonal simple segments. The QC procedure for these segments uses the KDE and considers 2 subclones with distinct mixing proportions. Differently from clonal CNAs, however, here the analysis is carried out at the level of each segment, i.e., without pooling segments with the same karyotypes. This makes it possible to use subclonal calls from callers that report segment-specific CCF values, e.g., Battenberg. The model in CNAqc ranks the proposed evolutionary alternatives (linear versus branching) based on the number of matched peaks. A subclonal segment with many matched peaks is likely to be correct.
Results from peak-based QC are available via plot_peaks_analysis
, and stored
inside the input object.
analyze_peaks(
x,
karyotypes = c("1:0", "1:1", "2:0", "2:1", "2:2"),
min_karyotype_size = 0,
min_absolute_karyotype_mutations = 100,
p_binsize_peaks = 0.005,
matching_epsilon = NULL,
purity_error = 0.05,
VAF_tolerance = 0.015,
n_bootstrap = 1,
kernel_adjust = 1,
matching_strategy = "closest",
KDE = TRUE,
starting_state_subclonal_evolution = "1:1",
cluster_subclonal_CCF = FALSE
)
x |
A CNAqc object. |
karyotypes |
For clonal simple CNAs, the list of segments to test; by default LOH regions (A, AA),
diploid regions (AB), and amplification regions (AAB, AABB) are tested, corresponding to
|
min_karyotype_size |
For clonal simple CNAs, a filter for the segments to test. The segment size is defined based on the number of mutations mapped, this cut is on the proportion relative to the whole set of segments one wishes to analyse (defined by 'karyotypes'). For example, by setting 'min_karyotype_size = 0.2' one would QC clonal simple CNAs that contain at least 20 The default of this parameter is '0' (all QCed). |
min_absolute_karyotype_mutations |
For clonal simple CNAs, as |
p_binsize_peaks |
For clonal simple CNAs, peaks detected will be filtered if, in a peak, we map
less than |
matching_epsilon |
Deprecated parameter. |
purity_error |
For clonal simple CNAs, the purity error tolerance to determine QC pass or fail. This can be
set automatically using function |
VAF_tolerance |
For clonal simple CNAs, a tolerance in comparing bands overlaps which is applied to the raw VAF values. |
n_bootstrap |
For clonal simple CNAs, the number of times peak detection is bootstrapped (by default 1). This helps sometimes finding peaks that might be visually observable but fail to be detected by the underlying peak-detection heuristics. |
kernel_adjust |
For KDE-based matches the adjust density parameter; see |
matching_strategy |
For clonal simple CNAs, if |
KDE |
Deprecated parameter. |
starting_state_subclonal_evolution |
For subclonal simple CNAs, the starting state to determine linear versus branching evolutionary models. By default this is an heterozygous diploid '1:1' state. |
cluster_subclonal_CCF |
For subclonal segments, should the tool try to merge segments with similar CCF and the same copy number alteration? |
An object of class cnaqc
, modified to hold the results from this analysis. For every type
of segment analyzed tables with summary peaks are available in x$peaks_analysis
. The most helpful table
is usually the one for simple clonal CNAs 'x$peaks_analysis$matches', which reports several information:
- 'mutation_multiplicity', the number of copies of the mutation (i.e., a phasing information); - 'peak', 'x', 'y' the expected peak, and the matched peak ('x' and 'y'); - 'offset', 'weight' and 'score', the factors of the final score; - 'QC', a pass/fail status for the peak.
The overall sample-level QC result is available in 'x$peaks_analysis$QC'.
auto_tolerance
, plot_peak_analysis
and plot_QC
.
data('example_dataset_CNAqc', package = 'CNAqc')
x = init(mutations = example_dataset_CNAqc$mutations, cna = example_dataset_CNAqc$cna, purity = example_dataset_CNAqc$purity)
# Note the run outputs
x = analyze_peaks(x)
# More precise messages
print(x)
# The tabulars with summary results per peak and segment
print(x$peaks_analysis)
# Analysis where simple clonal segments are matched with an alternative algorithm.
x = analyze_peaks(x, matching_strategy = "rightmost")
print(x)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.