knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

CINmetrics

The goal of CINmetrics package is to provide different methods of calculating copy number aberrations as a measure of Chromosomal Instability (CIN) metrics from the literature that can be applied to any cancer data set including The Cancer Genome Atlas.

library(CINmetrics)

The dataset provided with CINmetrics package is masked Copy Number variation data for Breast Cancer for 10 unique samples selected randomly from TCGA.

dim(maskCNV_BRCA)

Alternatively, you can download the entire dataset from TCGA using TCGAbiolinks package

## Not run:
#library(TCGAbiolinks)
#query.maskCNV.hg39.BRCA <- GDCquery(project = "TCGA-BRCA",
#              data.category = "Copy Number Variation",
#              data.type = "Masked Copy Number Segment", legacy=FALSE)
#GDCdownload(query = query.maskCNV.hg39.BRCA)
#maskCNV.BRCA <- GDCprepare(query = query.maskCNV.hg39.BRCA, summarizedExperiment = FALSE)
#maskCNV.BRCA <- data.frame(maskCNV.BRCA, stringsAsFactors = FALSE)
#tai.test <- tai(cnvData = maskCNV.BRCA)
## End(Not run)

Total Aberration Index

tai calculates the Total Aberration Index (TAI; Baumbusch LO, et. al.), "a measure of the abundance of genomic size of copy number changes in a tumour". It is defined as a weighted sum of the segment means ($|\bar{y}_{S_i}|$).

Biologically, it can also be interpreted as the absolute deviation from the normal copy number state averaged over all genomic locations.

$$ Total\ Aberration\ Index = \frac {\sum^{R}{i = 1} {d_i} \cdot |{\bar{y}{S_i}}|} {\sum^{R}{i = 1} {d_i}}\ \ where |\bar{y}{S_i}| \ge |\log_2 1.7| $$

tai.test <- tai(cnvData = maskCNV_BRCA)
head(tai.test)

Modified Total Aberration Index

taiModified calculates a modified Total Aberration Index using all sample values instead of those in aberrant copy number state, thus does not remove the directionality from the score.

$$ Modified\ Total\ Aberration\ Index = \frac {\sum^{R}{i = 1} {d_i} \cdot {\bar{y}{S_i}}} {\sum^{R}_{i = 1} {d_i}} $$

modified.tai.test <- taiModified(cnvData = maskCNV_BRCA)
head(modified.tai.test)

Copy Number Aberration

cna calculates the total number of copy number aberrations (CNA; Davidson JM, et. al.), defined as a segment with copy number outside the pre-defined range of 1.7-2.3 ($(\log_2 1.7 -1) \le \bar{y}{S_i} \le (\log_2 2.3 -1)$) that is not contiguous with an adjacent independent CNA of identical copy number. For our purposes, we have adapted the range to be $|\bar{y}{S_i}| \ge |\log_2 1.7|$, which is only slightly larger than the original.

This metric is very similar to the number of break points, but it comes with the caveat that adjacent segments need to have a difference in segmentation mean values.

$$ Total\ Copy\ Number\ Aberration = \sum^{R}{i = 1} n_i \ \ where\ \ \begin{align} |\bar{y}{S_i}| \ge |\log_2{1.7}|, \ |\bar{y}{S{i-1}} - \bar{y}_{S_i}| \ge 0.2, \ d_i \ge 10 \end{align} $$

cna.test <- cna(cnvData = maskCNV_BRCA)
head(cna.test)

Counting Altered Base segments

countingBaseSegments calculates the number of altered bases defined as the sums of the lengths of segments ($d_i$) with an absolute segment mean ($|\bar{y}_{S_i}|$) of greater than 0.2.

Biologically, this value can be thought to quantify numerical chromosomal instability. This is also a simpler representation of how much of the genome has been altered, and it does not run into the issue of sequencing coverage affecting the fraction of the genome altered.

$$ Number\ of\ Altered\ Bases = \sum^{R}{i = 1} d_i\ where\ |\bar{y}{S_i}| \ge 0.2 $$

base.seg.test <- countingBaseSegments(cnvData = maskCNV_BRCA)
head(base.seg.test)

Counting Number of Break Points

countingBreakPoints calculates the number of break points defined as the number of segments ($n_i$) with an absolute segment mean greater than 0.2. This is then doubled to account for the 5' and 3' break points.

Biologically, this value can be thought to quantify structural chromosomal instability.

$$ Number\ of \ Break\ Points = \sum^{R}{i = 1} (n_i \cdot 2)\ where\ |\bar{y}{S_i}| \ge 0.2 $$

break.points.test <- countingBreakPoints(cnvData = maskCNV_BRCA)
head(break.points.test)

Fraction of Genome Altered

fga calculates the fraction of the genome altered (FGA; Chin SF, et. al.), measured by taking the sum of the number of bases altered and dividing it by the genome length covered ($G$). Genome length covered was calculated by summing the lengths of each probe on the Affeymetrix 6.0 array. This calculation excludes sex chromosomes.

$$ Fraction\ Genome\ Altered = \frac {\sum^{R}{i = 1} d_i} {G} \ \ where\ |\bar{y}{S_i}| \ge 0.2 $$

fraction.genome.test <- fga(cnvData = maskCNV_BRCA)
head(fraction.genome.test)

CINmetrics

CINmetrics calculates tai, cna, number of altered base segments, number of break points, and fraction of genome altered and returns them as a single data frame.

cinmetrics.test <- CINmetrics(cnvData = maskCNV_BRCA)
head(cinmetrics.test)


lasseignelab/CINmetrics documentation built on March 23, 2023, 10:38 p.m.