knitr::opts_chunk$set( collapse = TRUE, comment = "##" )
The vignette describes inference of spatially-varying mutational processes. The guideline starts with matrix of regional mutation rates of mutation types in genomic regions, infers mutational components, estimates their biological relevance and groups in strand-independend/strand-dependent mutational processes. Finally, the guideline provides quality estimates of reconstructed processes using bootstrap and reflection correlations.
library(spacemut)
Matrix of regional mutation rates is loaded from the local web server:
info <- read.table("http://pklab.med.harvard.edu/ruslan/spacemut/TOPMed_30kb.txt",header=TRUE) dim(info)
head(info[,1:6])
knitr::kable(head(info[,1:6]))
Retain only columns of regional mutation rates:
rate <- info[,-c(1:3)]
Function extract.comp
decomposes observed regional mutation rates in a small number of mutational components using independent component analysis. Below we estimate 13 mutational components:
comp.info <- extract.comp(rate,13)
We rearrange components to have their order and signs consistent with the paper:
process.order <- c(1,2,9,10,13,8,4,5,3,6,12,11,7) sign.change <- 7 comp.info <- rearrange.components(comp.info,process.order,sign.change)
Output object comp.info
consists of matrices of components spectra and intensities. Matrix of components spectra:
icM <- comp.info[[1]] head(icM[,1:4])
knitr::kable(head(icM[,1:4]))
Matrix of components intensities:
icS <- comp.info[[2]] head(icS[,1:4])
knitr::kable(head(icS[,1:4]))
Of note, spectra and intensities have negative values due to Z-score transformation of mutation rates in the method. Positive (negative) values in spectra and intensities indicate higher (lower) values compared to genome-wide average. Reflection matrix of correlations between spectrum and reverse complementary spectrum of each pair of components is used to separate extracted components in strand symmetric, asymmetric and noise:
plot.reflection.matrix(icM)
For example, comp. 10
is symmetic, since it has spectrum reverse complementary to itself:
reflection.scatter(10,10,icM)
On the other hand comp. 1
is not symmetrical:
reflection.scatter(1,1,icM)
However, comp. 1
is reflected to comp. 2
spectrum representing a single strand-dependent process:
reflection.scatter(1,2,icM)
Function reflection.test
formally classifies components in symmetric/asymmetric, noise and combine them in mutational processes.
ref.prop <- reflection.test(icM)
Object ref.prop
contains classification of components, including symmetric/asymmteric type
and reflected component ref.comp
for each component:
head(ref.prop[[1]])
knitr::kable(head(ref.prop[[1]]))
A symmetric mutational component corresponds to a strand-independent mutational process, while a pair of two reflected components corresponds to a single strand-dependent mutational process. Annotation of mutational processes includes components that correspond to it (comp.X
comp.Y
):
head(ref.prop[[2]])
knitr::kable(head(ref.prop[[2]]))
Spectrum visualization of symmetric comp. 10
with equal rates of complementary mutations:
draw.signature(icM[,10])
Spectrum visualization of asymmetric comp. 1
with imbalanced rates of complementary mutations:
draw.signature(icM[,1])
Routine plot.intensities
enables exploration of spatial variation in component intensity along the genome. Intensity of maternal process, estimated as sum of intensities of comp. 4
and comp. 5
, along left arm of chromosome 8 is shown below:
plot.intensities(icS[,7]+icS[,8], info, chr=8, start=0,end=4e+7, span.wind=30)
Finally, robustness of components can be evaluated using statistical bootstrap of genomic regions or reflection correlation. Rounine spectra.bootstrap
estimates Spearman correlations between a component in original and n.boot
bootstrapped inferences:
boot <- spectra.bootstrap(icM,rate,n.boot=100,n.cores=20)
Visualization of summarized statistics:
visualize.bootstrap(boot,icM=icM)
Number 13 of initially extracted components may seems rather arbitrary. Reflection property suggests a natural criterion to evaluate number of components to extract that maximumizes number of components having reflection property. More specifically, routine select.ncomponents
extracts a range of components from n.min
and n.max
and evaluates optimal choice:
stat.extract <- select.ncomponents(rate,n.min=2,n.max=35,cutoff=0.7,n.cores=20) head(stat.extract)
knitr::kable(head(stat.extract,3))
Number of components and processes having reflection property depending on number of extracted components:
show.optimal.comp(stat.extract)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.