vrnmf_germline

Loading the data

The inference requires only a matrix of window-specific mutation rates. Below we use a matrix calculated based on TOPMed dataset:

library(vrnmf)
library(spacemut)

rate.info <- read.table("http://pklab.med.harvard.edu/ruslan/spacemut/tracks/TOPMed_10kb.txt",header=TRUE)
head(rate.info[,1:6])
knitr::kable(head(rate.info[,1:6]))

Retain only columns of regional mutation rates:

rate <- rate.info[,-c(1:3)]

Vrnmf inference of spectra of mutational components

First, matrix is preprocessed and required statistics of covariance matrix are estimated using vol_preprocess routine:

vol.info <- vol_preprocess(rate)

Volume-regularized NMF is applied to the co-occurence mutation matrix stored in the object vol.info to infer n.comp = 14 components using volume weight wvol = 7.9e-3 (see the manuscript for the motivation behind the choice of parameters). Alternating optimization was executed for n.iter = 3e+3 iterations with vol.iter and c.iter updates of matrices inside each iteration:

vr <- volnmf_main(vol.info, n.comp = 14, wvol = 7.9e-3, seed = 1,
                  n.iter = 3e+3, vol.iter = 2e+1, c.iter = 2e+1, 
                  verbose = FALSE)

Non-normalized spectra of mutational components are available in the matrix vr$C. To reflect the results in the paper, mutational spectra are then renormalized to the genome-average vector of mutation rates:

S <- vr$C*vol.info$col.factors/colMeans(rate)
rownames(S) <- colnames(rate)
colnames(S) <- paste("comp.",1:ncol(S))

Note that order of components (columns of S) is arbitrary, and may not be consistent with that used in the manuscript. Spectra can now be visualized:

draw.signature(S[,2])

Additionally, we can explore reflection properties of components:

plot.reflection.matrix(S)

And visualize reflections of individual components:

reflection.scatter(2,3,S)

Estimation of intensities of mutational components

Given known spectra of mutational components, intensities can then be obtained using non-negative least squares. A fast per-window estimation is implemented in a routine infer_intensities. However, we will additionally model offset spectrum, that would reflect all unaccounted mutational forces, using a routine factor_intensities (it takes a few to ten minutes to optimize):

intensities.info <- factor_intensities(as.matrix(vr$C), t(as.matrix(vol.info$X.process)), 
                                  fit.nmf = FALSE, fit.factor = TRUE, 
                                  n.iter = 1e+2, verbose = FALSE) 
str(intensities.info)
intensities <- t(intensities.info$intensities)

Intensities can now be visualized:

plot.intensities(intensities[,13], rate.info, chr=16, start=0,end=4e+7,  span.wind=30)


hms-dbmi/spacemut documentation built on Feb. 8, 2021, 6:27 a.m.