suppressPackageStartupMessages(library(knitr))
suppressPackageStartupMessages(library(katdetectr))

Abstract

katdetectr is an R package for the detection, characterization and visualization of localized hypermutated regions, often referred to as kataegis.

Please see the Application Note (under submission) for additional background, details and performance evaluations of katdetectr.

The general workflow of katdetectr can be summarized as follows:

  1. Import of genomic variants; VCF, MAF or VRanges objects.
  2. Detection of kataegis foci.
  3. Visualization of segmentation and kataegis foci.

Below, this workflow is performed in a step-by-step manner on publicly-available datasets which are included within this package.

Importing genomic variants

Genomic variants from multiple common data-formats (VCF/MAF and VRanges objects) can be imported into katdetectr. These can be contain either single or multiple samples, in which case records can be aggregated by setting aggregateRecords = TRUE. Overlapping genomic variants (e.g., an InDel and SNV) are reduced into a single record.

From these genomic variants, we calculate the intermutation distance (IMD). The IMD is defined as the genomic distance (in bp) between a genomic variant and it's respective nearest upstream genomic variant (5' A <- B 3').

# Genomic variants stored within the VCF format.
pathToVCF <- system.file(package = "katdetectr", "extdata/CPTAC_Breast.vcf")

# Genomic variants stored within the MAF format.
pathToMAF <- system.file(package = "katdetectr", "extdata/APL_primary.maf")

# In addition, we can generate synthetic genomic variants with interjected kataegis regions
# using generateSyntheticData(). This will output a VRanges object.
syntheticData <- generateSyntheticData(nBackgroundVariants = 2500, nKataegisFoci = 1)

Detection of kataegis foci

Using detectKataegis(), we can employ changepoint detection to detect distinct clusters of varying IMD and size. Various underlying parameters can be altered to improve or alter the default methodology.

# Detect kataegis foci within the given VCF file.
kdVCF <- detectKataegis(genomicVariants = pathToVCF)

# # Detect kataegis foci within the given MAF file.
# As this file contains multiple samples, we set aggregateRecords = TRUE.
kdMAF <- detectKataegis(genomicVariants = pathToMAF, aggregateRecords = TRUE)

# Detect kataegis foci within our synthetic data.
kdSynthetic <- detectKataegis(genomicVariants = syntheticData)

All relevant input and subsequent results are stored within KatDetect objects. Using summary(), show() and/or print(), we can generate overviews of these KatDetect object(s).

summary(kdVCF)
print(kdVCF)
show(kdVCF)

# Or simply:
kdVCF

Underlying data can be retrieved from these KatDetect objects by the following functions:

# Processed genomic variants used as input for changepoint detection.
getGenomicVariants(kdVCF)

# GRanges containing the segments as derived from changepoint detection.
getSegments(kdVCF)

# GRanges containing segments designated as putative kataegis foci.
getKataegisFoci(kdVCF)

# Supplementary information.
getInfo(kdVCF)

Visualization of segmentation and kataegis foci

Per sample, we can visualize the IMD, detected segments and putative kataegis foci as a rainfall plot. In addition, this allows for a per-chromosome approach which can highlight the putative kataegis foci.

rainfallPlot(kdVCF)

# With showSegmentation, the detected segments (changepoints) as visualized with their mean IMD.
rainfallPlot(kdMAF, showSegmentation = TRUE)

# With showSequence, we can display specific chromosomes or all chromosomes in which a putative kataegis foci has been detected.
rainfallPlot(kdSynthetic, showKataegis = TRUE, showSegmentation = TRUE, showSequence = "Kataegis")

Session Information

utils::sessionInfo()


daanhazelaar/katdetectr documentation built on June 3, 2022, 4:58 a.m.