The inference requires only a matrix of window-specific mutation rates. Below we use a matrix calculated based on TOPMed dataset:
library(vrnmf) library(spacemut) rate.info <- read.table("http://pklab.med.harvard.edu/ruslan/spacemut/tracks/TOPMed_10kb.txt",header=TRUE)
head(rate.info[,1:6])
knitr::kable(head(rate.info[,1:6]))
Retain only columns of regional mutation rates:
rate <- rate.info[,-c(1:3)]
First, matrix is preprocessed and required statistics of covariance matrix are estimated using vol_preprocess
routine:
vol.info <- vol_preprocess(rate)
Volume-regularized NMF is applied to the co-occurence mutation matrix stored in the object vol.info
to infer n.comp = 14
components using volume weight wvol = 7.9e-3
(see the manuscript for the motivation behind the choice of parameters). Alternating optimization was executed for n.iter = 3e+3
iterations with vol.iter
and c.iter
updates of matrices inside each iteration:
vr <- volnmf_main(vol.info, n.comp = 14, wvol = 7.9e-3, seed = 1, n.iter = 3e+3, vol.iter = 2e+1, c.iter = 2e+1, verbose = FALSE)
Non-normalized spectra of mutational components are available in the matrix vr$C
. To reflect the results in the paper, mutational spectra are then renormalized to the genome-average vector of mutation rates:
S <- vr$C*vol.info$col.factors/colMeans(rate) rownames(S) <- colnames(rate) colnames(S) <- paste("comp.",1:ncol(S))
Note that order of components (columns of S
) is arbitrary, and may not be consistent with that used in the manuscript.
Spectra can now be visualized:
draw.signature(S[,2])
Additionally, we can explore reflection properties of components:
plot.reflection.matrix(S)
And visualize reflections of individual components:
reflection.scatter(2,3,S)
Given known spectra of mutational components, intensities can then be obtained using non-negative least squares. A fast per-window estimation is implemented in a routine infer_intensities
. However, we will additionally model offset spectrum, that would reflect all unaccounted mutational forces, using a routine factor_intensities
(it takes a few to ten minutes to optimize):
intensities.info <- factor_intensities(as.matrix(vr$C), t(as.matrix(vol.info$X.process)), fit.nmf = FALSE, fit.factor = TRUE, n.iter = 1e+2, verbose = FALSE)
str(intensities.info)
intensities <- t(intensities.info$intensities)
Intensities can now be visualized:
plot.intensities(intensities[,13], rate.info, chr=16, start=0,end=4e+7, span.wind=30)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.