analyze_peaks: QC by peak-detection algorithms.
In caravagnalab/CNAqc: CNAqc - Copy Number Analysis quality check

analyze_peaks

R Documentation

QC by peak-detection algorithms.

Description

CNAqc uses peak-detection algorithms to QC data; all leverage the idea that VAFs peaks are known for mutations mapped to a segment with given minor/ major allele copies. CNAqc therefore computes expected peaks, and compares them to peaks detected from data. The theory works with minor modifications for both clonal and subclonal segments. Three distinct algorithms are available, each one working with a different type of copy number segment; all analyses are called by this function, which takes care of running all the suitable algorithms based on the input data.

* Simple clonal segments (1:0, 2:0, 1:1, 2:1, 2:2). This QC measures an error for the precision of the current purity estimate, failing a whole sample or a subset of segments the value is over a desired maximum value. The error is determined as a linear combination from the distance between VAF peaks and their theoretical expectation. For this analysis, all mutations mapping across any segment with the same major/minor alleles are pooled. Note that this score can be used to select among alternative copy number solutions, i.e., favouring a solution with lower score. The peaks are determined i) via peak-detection algorithms from the peakPick package, applied to a Gaussian kernel density estimate (gKDE) smooth of the VAF distribution, and ii) via the Bmix Binomial mixture model. Peak-matching (i.e., determining what data peak is closest to the expected peak) has two possible implementations: one matching the closest peaks by euclidean distance, the other ranking peaks from higher to lowr VAFs, and prioritising the former.

* Complex clonal segments. The QC procedure for these “general” segments uses only the KDE and, as for simple segments, pools all mutations mapping across any segment with the same major/minor alleles. In this case,t no segment-level or sample-level scores are produced, and complex segment with many matched peaks is likely to be correct.

* Subclonal simple segments. The QC procedure for these segments uses the KDE and considers 2 subclones with distinct mixing proportions. Differently from clonal CNAs, however, here the analysis is carried out at the level of each segment, i.e., without pooling segments with the same karyotypes. This makes it possible to use subclonal calls from callers that report segment-specific CCF values, e.g., Battenberg. The model in CNAqc ranks the proposed evolutionary alternatives (linear versus branching) based on the number of matched peaks. A subclonal segment with many matched peaks is likely to be correct.

Results from peak-based QC are available via plot_peaks_analysis, and stored inside the input object.

Usage

analyze_peaks(
  x,
  karyotypes = c("1:0", "1:1", "2:0", "2:1", "2:2"),
  min_karyotype_size = 0,
  min_absolute_karyotype_mutations = 100,
  p_binsize_peaks = 0.005,
  matching_epsilon = NULL,
  purity_error = 0.05,
  VAF_tolerance = 0.015,
  n_bootstrap = 1,
  kernel_adjust = 1,
  matching_strategy = "closest",
  KDE = TRUE,
  starting_state_subclonal_evolution = "1:1",
  cluster_subclonal_CCF = FALSE,
  min_VAF = 0
)

Arguments

`x`	A CNAqc object.
`karyotypes`	For clonal simple CNAs, the list of segments to test; by default LOH regions (A, AA), diploid regions (AB), and amplification regions (AAB, AABB) are tested, corresponding to `'1:0', '1:1', '2:1', '2:0', '2:2'` in "Major:minor" notation.
`min_karyotype_size`	For clonal simple CNAs, a filter for the segments to test. The segment size is defined based on the number of mutations mapped, this cut is on the proportion relative to the whole set of segments one wishes to analyse (defined by 'karyotypes'). For example, by setting 'min_karyotype_size = 0.2' one would QC clonal simple CNAs that contain at least 20 The default of this parameter is '0' (all QCed).
`min_absolute_karyotype_mutations`	For clonal simple CNAs, as `min_karyotype_size` but with a cut measured on absolute mutation counts. For example, by setting 'min_absolute_karyotype_mutations = 150' one would QC clonal simple CNAs that contain at least '150' mutations. The default of this parameter is '100'.
`p_binsize_peaks`	For clonal simple CNAs, peaks detected will be filtered if, in a peak, we map less than `p_binsize_peaks * N` mutations. The value `N` is obtained couting all mutations that map in all peaks. By default this parameters is '0.005'.
`matching_epsilon`	Deprecated parameter.
`purity_error`	For clonal simple CNAs, the purity error tolerance to determine QC pass or fail. This can be set automatically using function `auto_tolerance` to optimise the analysis based on a desired rate of false positives matches, as a function of the data coverage and (putative) purity.
`VAF_tolerance`	For clonal simple CNAs, a tolerance in comparing bands overlaps which is applied to the raw VAF values.
`n_bootstrap`	For clonal simple CNAs, the number of times peak detection is bootstrapped (by default 1). This helps sometimes finding peaks that might be visually observable but fail to be detected by the underlying peak-detection heuristics.
`kernel_adjust`	For KDE-based matches the adjust density parameter; see `density`. Note that a Gaussian kernel is used by setting (`kernel = 'gaussian'`).
`matching_strategy`	For clonal simple CNAs, if `"closest"` the closest peak will be used to match the expected peak. If `"rightmost"` peaks are matched prioritizing right to left peaks (the higher-VAF gets matched first); this strategy is more correct in principle but works only if there are no spurious peaks in the estimated density. By default the `"closest"` strategy is used.
`KDE`	Deprecated parameter.
`starting_state_subclonal_evolution`	For subclonal simple CNAs, the starting state to determine linear versus branching evolutionary models. By default this is an heterozygous diploid '1:1' state.
`cluster_subclonal_CCF`	For subclonal segments, should the tool try to merge segments with similar CCF and the same copy number alteration?
`min_VAF`	Only mutations with VAF higher than the supplied cut-off will be used for the QC, but they will not be removed from the final object. If the data are multisample (multi-region or longitudinal data) it is strongly advised to set as 0 (default) to avoid including private mutations of samples in the peak detection.

Value

An object of class cnaqc, modified to hold the results from this analysis. For every type of segment analyzed tables with summary peaks are available in x$peaks_analysis. The most helpful table is usually the one for simple clonal CNAs 'x$peaks_analysis$matches', which reports several information:

- 'mutation_multiplicity', the number of copies of the mutation (i.e., a phasing information); - 'peak', 'x', 'y' the expected peak, and the matched peak ('x' and 'y'); - 'offset', 'weight' and 'score', the factors of the final score; - 'QC', a pass/fail status for the peak.

The overall sample-level QC result is available in 'x$peaks_analysis$QC'.

Examples

data('example_dataset_CNAqc', package = 'CNAqc')
x = init(mutations = example_dataset_CNAqc$mutations, cna = example_dataset_CNAqc$cna, purity = example_dataset_CNAqc$purity)

# Note the run outputs
x = analyze_peaks(x)

# More precise messages
print(x)

# The tabulars with summary results per peak and segment
print(x$peaks_analysis)

# Analysis where simple clonal segments are matched with an alternative algorithm.
x = analyze_peaks(x, matching_strategy = "rightmost")

print(x)

caravagnalab/CNAqc documentation built on June 2, 2025, 1:21 a.m.