In caravagnalab/CNAqc: CNAqc - Copy Number Analysis quality check

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

options(crayon.enabled=F)

library(CNAqc)

require(dplyr)

Input format

CNAqc notation:

Major:minor denotes a CNA segments with Major/minor copies of the major/minor allele. We sometimes call "1:1" the karyotype or the copy state of a segment.
Clonal denotes something (a CNA, or a mutaiton) that is found at clonality 100% or, so to say, in all tumour cells; subclonal is something that is found in a subset of the tumour cells.

CNAqc comes with a template dataset.

# Load template data
data('example_dataset_CNAqc', package = 'CNAqc')

Somatic mutations

These fields are required for somatic mutations:

the mutation location, as chr, from, to.
the mutation reference and alternative alleles ref and alt;
the total number of reads covering the mutated base(s), DP (depth);
the total number of reads covering only the mutant allele(s), NV (number of reads with variant);
the Variant Allele Frequency, VAF, defined as NV/DP.

Chromosome names and alleles should be in character format; chromosomes must be in the format chr1, chr2, etc..

# Example input SNVs
example_dataset_CNAqc$mutations %>%
  dplyr::select(chr, from, to, # Genomic coordinates
         ref, alt,      # Alleles (reference and alternative)
         DP, NV, VAF    # Read counts (depth, number of variant reads, tumour VAF)
         ) %>%
         print()

Adding driver mutations

Optionally, you can annotate driver mutations by adding the following columns to your data:

is_driver: whether the mutation is a driver, or not;
driver_label: the label to be shown in the plots that report also drivers (e.g., BRAF V600E could be a label).

example_dataset_CNAqc$mutations %>%
  dplyr::select(chr, from, to, ref, alt, is_driver, driver_label) %>%
  filter(is_driver) %>% 
print()

Copy number segments

CNAqc distinguishes between 3 types of copy number segments:

clonal simple CNAs:
"1:0" loss of heterzygosity (LOH);
"2:0" copy neutral LOH;
"1:1" diploid heterozygous (assumed to be the normal reference);
"2:1" trisomy;
"2:2" tetraploidy.
clonal complex CNAs:
any other clonal CNA that is not simple;
subclonal CNAs:
any mixture of 2 subclones, where each one of the subclones is defined by a simple CNA.

These fields are required for all types of CNAs:

the segment location, as chr, from and to;
the segment absolute integer number of copies of the major and minor alleles, as Major and minor;

Adding subclonal copy numbers

Optionally, you can annotate also subclonal CNAs.

To do this first you annotate the Cancer Cell Fraction (CCF) CCF for each input segment as an extra column in the dataframe: segments with CCF = 1 are clonal, otherwise subclonal;

# Example input CNA
print(
  example_dataset_CNAqc$cna %>% 
        select(
          chr, from, to, # Genomic coordinates
          Major, minor  # Number of copies of major/ and minor allele (B-allele)
        )
  )

Note: the CCF of a segment can only be computed by callers that support subclonal segments. If there are no subclonal CNAs the CCF column can be omitted. In that case CNAqc assumes all segments to be clonal and assigns CCF = 1.

If you wish to use subclonal CNAs, further columns are required.

Major_2 and minor_2 reporting the major and minor alleles for the second clone.

The CNAqc model captures a mixture of two subclones, one with segment Major:minor and CCF CCF (which is compulsory), and another with segment Major_2:minor_2 and CCF 1 - CCF.

The values of Major_2 and minor_2 for clonal segments (CCF = 1) can be NA and will not be used.

Tumour purity

Tumour purity, defined as the percentage of reads coming from tumour cells must be a value in $[0, 1]$.

# Example purity
print(example_dataset_CNAqc$purity)

Initialisation of a new dataset

To use CNAqc, you need to initialize a cnaqc S3 object with the initialisation function init.

This function will check input formats, and will map mutations to CNA segments. This function does not subset the data and retains all and only the mutations that map on top of a CNA segment.

When you create a dataset it is required to explicit the reference genome for the assembly (see below).

# Use SNVs, CNAs and tumour purity (hg19 reference, see below)
x = init(
  mutations = example_dataset_CNAqc$mutations, 
  cna = example_dataset_CNAqc$cna,
  purity = example_dataset_CNAqc$purity,
  ref = 'hg19'
  )

The summary of x can be print to provide a number of usefull information.

print(x)

Subsetting data

You can subset randomly the data; if drivers are annotated, they can be forced to stay in.

y_5000 = subsample(x, N = 5000, keep_drivers = TRUE)

# 5000 + the ranomd entries that we sampled before
print(y_5000)

You can also subset data by karyotype of the segments, and by total copy number of the segment.

Both subset functions do not keep drivers that map off from the selected segments.

# Triploid and copy-neutral LOH segments 
y_tripl_cnloh = subset_by_segment_karyotype(x, karyotypes = c('2:1', '2:0'))

print(y_tripl_cnloh)

# Two and four copies
y_2_4 = subset_by_segment_totalcn(x, totalcn = c(2, 4))

print(y_2_4)

Reference genome coordinates

CNAqc uses a genome coordinates reference system to convert relative relative to absolute coordinates, a step required to plot segments across the whole genome (see plot_segments). For instance, if a mutation maps to position $100$ of chromosome chr2, its absolute coordinate is $100 + L$ where $L$ is the length of chr1. The reference system adopted by CNAqc needs therefore to report the length of each chromosome, plus the information regarding the boundary of each centromere.

CNAqc supports two coordinates reference genomes:

hg19 or GRCh37;
hg38 or GRCh38 (default),

for which two dataframes are stored inside the package.

CNAqc:::get_reference("hg19") # equivalent to CNAqc:::get_reference("GRCh37")

CNAqc:::get_reference("GRCh38") # equivalent to CNAqc:::get_reference("hg38")

The reference genomes has to be specified when you create a CNAqc object -- see function init.

Note: mapping of mutations onto segments is independent of the reference genome, and it will work as far as both mutation and CNA segments are mapped to the same reference.

You can use a hidden function to plot a reference

CNAqc:::blank_genome(ref = 'hg19') + 
  ggplot2::labs(title = "HG19 genome reference")

Example CNAqc object(s)

CNAqc comes with an object released by PCAWG

CNAqc::example_PCAWG

caravagnalab/CNAqc documentation built on June 2, 2025, 1:21 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

caravagnalab/CNAqc
CNAqc - Copy Number Analysis quality check

In caravagnalab/CNAqc: CNAqc - Copy Number Analysis quality check

Input format

Somatic mutations

Adding driver mutations

Copy number segments

Adding subclonal copy numbers

Tumour purity

Initialisation of a new dataset

Subsetting data

Reference genome coordinates

Example CNAqc object(s)

R Package Documentation

Browse R Packages

We want your feedback!

caravagnalab/CNAqc CNAqc - Copy Number Analysis quality check

In caravagnalab/CNAqc: CNAqc - Copy Number Analysis quality check

Input format

Somatic mutations

Adding driver mutations

Copy number segments

Adding subclonal copy numbers

Tumour purity

Initialisation of a new dataset

Subsetting data

Reference genome coordinates

Example CNAqc object(s)

R Package Documentation

Browse R Packages

We want your feedback!

caravagnalab/CNAqc
CNAqc - Copy Number Analysis quality check