suppressPackageStartupMessages(library(knitr)) suppressPackageStartupMessages(library(katdetectr))
katdetectr
is an R package for the detection, characterization and visualization of localized hypermutated regions, often referred to as kataegis.
Please see the Application Note (under submission) for additional background, details and performance evaluations of katdetectr
.
The general workflow of katdetectr
can be summarized as follows:
Below, this workflow is performed in a step-by-step manner on publicly-available datasets which are included within this package.
Genomic variants from multiple common data-formats (VCF/MAF and VRanges objects) can be imported into katdetectr.
These can be contain either single or multiple samples, in which case records can be aggregated by setting aggregateRecords = TRUE
. Overlapping genomic variants (e.g., an InDel and SNV) are reduced into a single record.
From these genomic variants, we calculate the intermutation distance (IMD). The IMD is defined as the genomic distance (in bp) between a genomic variant and it's respective nearest upstream genomic variant (5' A <- B 3').
# Genomic variants stored within the VCF format. pathToVCF <- system.file(package = "katdetectr", "extdata/CPTAC_Breast.vcf") # Genomic variants stored within the MAF format. pathToMAF <- system.file(package = "katdetectr", "extdata/APL_primary.maf") # In addition, we can generate synthetic genomic variants with interjected kataegis regions # using generateSyntheticData(). This will output a VRanges object. syntheticData <- generateSyntheticData(nBackgroundVariants = 2500, nKataegisFoci = 1)
Using detectKataegis()
, we can employ changepoint detection to detect distinct clusters of varying IMD and size.
Various underlying parameters can be altered to improve or alter the default methodology.
# Detect kataegis foci within the given VCF file. kdVCF <- detectKataegis(genomicVariants = pathToVCF) # # Detect kataegis foci within the given MAF file. # As this file contains multiple samples, we set aggregateRecords = TRUE. kdMAF <- detectKataegis(genomicVariants = pathToMAF, aggregateRecords = TRUE) # Detect kataegis foci within our synthetic data. kdSynthetic <- detectKataegis(genomicVariants = syntheticData)
All relevant input and subsequent results are stored within KatDetect
objects.
Using summary()
, show()
and/or print()
, we can generate overviews of these KatDetect
object(s).
summary(kdVCF) print(kdVCF) show(kdVCF) # Or simply: kdVCF
Underlying data can be retrieved from these KatDetect
objects by the following functions:
# Processed genomic variants used as input for changepoint detection. getGenomicVariants(kdVCF) # GRanges containing the segments as derived from changepoint detection. getSegments(kdVCF) # GRanges containing segments designated as putative kataegis foci. getKataegisFoci(kdVCF) # Supplementary information. getInfo(kdVCF)
Per sample, we can visualize the IMD, detected segments and putative kataegis foci as a rainfall plot. In addition, this allows for a per-chromosome approach which can highlight the putative kataegis foci.
rainfallPlot(kdVCF) # With showSegmentation, the detected segments (changepoints) as visualized with their mean IMD. rainfallPlot(kdMAF, showSegmentation = TRUE) # With showSequence, we can display specific chromosomes or all chromosomes in which a putative kataegis foci has been detected. rainfallPlot(kdSynthetic, showKataegis = TRUE, showSegmentation = TRUE, showSequence = "Kataegis")
utils::sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.