knitr::opts_chunk$set(
  collapse = TRUE
)

TrackSig uses mutational signatures to find changes in distribution of mutation types, that may indicate the presence of subclones in cancer. TrackSigFreq is an extension of TrackSig, which also incorporates variant density into the model. Both the TrackSig and TrackSigFreq approaches are supported in the R package TrackSig.

TrackSig is designed for use with whole genome sequencing (WGS) data. Changes in mutation distribution may be detectable in whole exome sequencing (WES) data, but samples may not contain sufficient mutations to satisfy the assumptions of the model, in particular if employing the TrackSigFreq approach.

Demo

Using the example data provided in the TrackSig package, the following demo will walk through detecting signatures present in a sample, retrieving the fitted signature trajectory, and plotting the trajectory.

library(TrackSig)
library(ggplot2)

Each mutational signature can be represented as a categorical distribution over 96 substitution mutation types.

TrackSig provides signature definitions published in Alexandrov et.al (2013). However by default, TrackSig uses a modified version of these where sub-signatures are merged (ex. SBS7a, SBS7b, SBS7c, SBS7d are collapsed to SBS7) and 'noisy signatures' SBS2 and SBS13 also merged (SBS2.13). The user may provide their own signature definitions if desired.

TrackSig fits signatures provided to a sample using expectation-maximization (EM) algorithm. To reduce noise when fitting signatures and speed up the fitting process, it is recommended to subset the entire signature catalog to only those active in the sample. TrackSig provides functionality for detecting active signatures.

Detecting active signatures

First, restrict the list of signatures to fit. This is recommended for improving speed. Here, we choose a threshold of 5%, meaning that signatures with activity below this level across all timepoints will not be fit.

vcfFile = system.file(package = "TrackSig", "extdata/Example.vcf")
cnaFile = system.file(package = "TrackSig", "extdata/Example_cna.txt")
purity = 1

detectedSigs <- detectActiveSignatures(vcfFile = vcfFile, cnaFile = cnaFile,
                                       purity = purity, threshold = 0.05)

There is a warning about matching the reference genome, because the example vcf file is synthetically generated and not mutations from a real patient.

Computing trajectory

TrackSig bins the mutation data in variant frequency space, and scores each partitioning of the bins. The set of mixtures of mutational signature activities in each bin is the trajectory over (pseudo)time. A too-small bin size will take more time to fit, and provide fewer mutations per bin to support the fit, thereby increasing noise in each bin. A too-large bin size will act like treating the whole sample as one bin, and in this case it is possible that no changepoints are identified. Where possible, it is recommended to pick the bin size such that there are approximately 400 mutations per bin.

The function TrackSig() has three available methods for segmentation, controlled by the parameter scoreMethod. These are:

Lets compute the trajectory for all timepoints.

set.seed(1224)

traj <- TrackSig(sampleID = "example", activeInSample = detectedSigs,
                 vcfFile = vcfFile, cnaFile = cnaFile, purity = purity,
                 scoreMethod = "SigFreq", binSize = 100)

The traj object contains the activities for each signature (rownames) in each bin (colnames). We can inspect the first five:

traj$mixtures[,0:5]

The traj object also contains the dataframe binData, which is used for plotting, and made external for the user's reference. Inspecting the first 5 entries reveals variant data, some of which is provided to TrackSig as input, and some of which is computed.

head(traj$binData,5)

Plotting trajectories

plotTrajectory() takes a list object, as prepared by TrackSig().

linPlot <- plotTrajectory(traj, linearX = T) 
linPlot

plotTrajectory() returns a ggplot object, which may be further modified, as desired. As an example, we can add a title, and rotate tick labels.

linPlot <- linPlot + labs(title = "Example trajectory with linear x-axis")
linPlot <- linPlot + theme(axis.text.x=element_text(angle=45, hjust=1))
linPlot

Cancer cell fraction should in theory never exceed 1. However, as described in Tarabichi et. al (2021), "false-positive SNVs or inaccurate CNAs can cause spurious superclonal clusters (that is, those with CCF > 1)". There are cases where it is desirable to display these as average number of mutant alleles per cell (ANMAC), which is controlled via the argument anmac

Because each bin is equal size, they appear warped in terms of mutation density. plotTrajectory() takes argument linearX which decides which way to draw the x-axis: scaling bin size to the density of mutations within it, or not.

nonLinPlot <- plotTrajectory(traj, linearX = F, anmac = T) + labs(title = "Example trajectory with non-linear x-axis")
nonLinPlot

When plotting with non-linear axis, can use addPhiHist() to display histogram of mutations on top of trajectory plot. Note that cowplot is used to align the axes, and the object returned is a TableGrob, not a ggplot object.

plotGrobs <- addPhiHist(traj, nonLinPlot)
grid::grid.draw(plotGrobs)


morrislab/TrackSigFreq documentation built on July 5, 2021, 6:33 a.m.