knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

options(crayon.enabled=F)
library(CNAqc)

# We work with the PCAWG object
x = CNAqc::example_PCAWG

print(x)

Peak analysis

CNAqc uses peak-detection algorithms to QC data; all leverage the idea that VAFs peaks are known for mutations mapped to a segment with given minor/ major allele copies. CNAqc therefore computes expected peaks, and compares them to peaks detected from data. The theory works with minor modifications for both clonal and subclonal segments.

Three distinct algorithms are available, each one working with a different type of copy number segment; all analyses are called by function analyze_peaks.

x = analyze_peaks(x)

# Shows results
print(x)

Simple clonal segments (1:0, 2:0, 1:1, 2:1, 2:2)

This QC measures an error for the precision of the current purity estimate, failing a whole sample or a subset of segments the value is over a desired maximum value. The error is determined as a linear combination from the distance between VAF peaks and their theoretical expectation. For this analysis, all mutations mapping across any segment with the same major/minor alleles are pooled.

Note: the score can be used to select among alternative copy number solutions, i.e., favouring a solution with lower score.

The peaks are determined via:

Peak-matching (i.e., determining what data peak is closest to the expected peak) has two possible implementations:

Results from peak-based QC are available via plot_peaks_analysis.

plot_peaks_analysis(x)

Gray panels are placeholders for segments among 1:0, 2:0, 1:1, 2:1, 2:2 that are not available for the sample. Each vertical dashed line is an expected peak, the bandwidth around being the tolerance we use to match peaks (based on purity_error, adjusted for segment ploidy and tumour purity). Each dot is a peak detected from data, with a bandwidth of tolerance (fixed) around it.

Note that:

Options of function plot_peaks_analysis allow to separate the plots.

Note: a chromosome-level analysis is possible by using function split_by_chromosome to separate a CNAqc object into chromosomes, and then running a standard analysis on each chromosome.

Complex clonal segments

The QC procedure for these "general" segments uses only the gKDE and, as for simple segments, pools all mutations mapping across any segment with the same major/minor alleles.

plot_peaks_analysis(x, what = 'general')

The plot is similar to the one for simple segments, but no segment-level or sample-level scores are produced. A complex segment with many matched peaks is likely to be correct.

Subclonal simple segments

The QC procedure for these segments uses the gKDE and considers 2 subclones with distinct mixing proportions. Differently from clonal CNAs, however, here the analysis is carried out at the level of each segment, i.e., without pooling segments with the same karyotypes. This makes it possible to use subclonal calls fromcallers that report segment-specific CCF values, e.g., Battenberg.

plot_peaks_analysis(x, what = 'subclonal')

The visual layout of this plot is the same of complex clonal CNAs; not that the facet reports the distinct evolutionary models that have been generated to QC subclonal CNAs. The model in CNAqc ranks the proposed evolutionary alternatives (linear versus branching) based on the number of matched peaks. A subclonal segment with many matched peaks is likely to be correct.

Summary results

For every type of segment analyzed tables with summary peaks are available in x$peaks_analysis.

# Simple clonal CNAs - each segment with `discarded = FALSE` has been analysed
x$peaks_analysis$matches

# Complex clonal CNAs
x$peaks_analysis$general$expected_peaks

# Subclonal CNAs
x$peaks_analysis$subclonal$expected_peaks

The most helpful table is usually the one for simple clonal CNAs x$peaks_analysis$matches, which reports several information:

The overall sample-level QC result - "PASS"/"FAIL" - is available in

x$peaks_analysis$QC

You can summarise QC results in a plot.

plot_qc(x)


caravagnalab/CNAqc documentation built on Oct. 31, 2024, 3:54 a.m.