This vignette shows how we can simulate the stochastic process by which tumors accrue a specific set of mutations (more or less) sequentially, or accrue alternate mutually exclusive mutations. After positing some mutation pattern as might be associated with a particular cancer, we generate a population of tumors and the associated SNV data recapitulating mutation patterns determined by this underlying pattern.
First, we'll generate a transition matrix that determines which mutations arise given which other mutations exist. We'll try a simple case with 6 mutations. Then, we'll generate a tumor using the conditional probability matrix.
library(TACG) library(igraph) library(gtools) #Generating 6 mutation loci: set.seed(123) loci <- sample(1:100000,6,replace=FALSE) chrs <- sort(sample(1:22,length(loci),replace=TRUE)) probset <- c(.6,.15,.1,.05,.05,.05) genenames <- c('A','B','C','D','E','F') psi <- c(.35,.28,.17,.1,.07,.03) mutationProbs <- generateMutationProbs(loci, chrs, genenames, mutually.exclusive=list(c(2,4)), probs=probset) tumor <- Tumor(psi, rounds=length(psi), pcnv=0, nu=0, mutationProbs=mutationProbs, labelSize=.5) data <- generateTumorData(tumor,snps.seq=0,snps.ary=NA,mu=100,sigma.reads=10,sigma0.lrr=.15, sigma0.baf=.03,density.sigma=.1)
The function 'generateMutationProbs' randomly generates a matrix of conditional mutation probabilities. Loci, chromosomes, and genenames are given, as are mutual exclusivity relationships. One can also manually create an object containing this information.
To illustrate how this conditional matrix is equivalent to sampling from a set of DAGs, I'll show how one can arbitrarily constrain what DAGs are possible by how one constructs the matrix. For example, let's suppose we have 6 mutations, which we'll call m1 through m6.
probmat <- matrix(0,nrow=7,ncol=6) rowIndices <- c(1:3,5,6) colIndices <- c(1:3,6,6) for(i in 1:length(rowIndices)){ probmat[rowIndices[i],colIndices[i]] <- 1 } probmat[4,4] <- .5 probmat[4,5] <- .5
There are two possible paths through tha graph shown here:
DAG.df <- data.frame('Parent'=c('Normal','m1','m2','m3','m3','m4','m5'), 'Node'=c('m1','m2','m3','m4','m5','m6','m6')) DAG <- graph.data.frame(DAG.df) plot(DAG, layout = layout.reingold.tilford)
The process that simulates the tumor samples fromp ossible paths going through m4 and m5 with probabilities 0.6 and 0.4, respectively. This is of course conditional on the process making it far enough to reach the branch point. Realistically, one may sample a patient prior to their tumor reaching a given point in the DAG, which can be simulated here as well, as shown below.
Another example, where all paths are possible except at the juncture with mutual exclusivity:
probmat <- t(matrix(c(rep(1/4,4),0,rep(1/3,4),0,1/3,1/3,1/2,1/2,0,0,1/2,1/2,0,0), nrow=4,ncol=5)) DAG2.df <- data.frame('Parent'=c(rep('Normal',4),rep('m1',3),'m2','m2'), 'Node'=c('m1','m2','m3','m4','m2','m3','m4','m3','m4')) DAG2 <- graph.data.frame(DAG2.df) plot(DAG2, layout = layout.reingold.tilford)
For this example, there are eight possible paths from which the process samples.
The 'tumor' object that results contains a slot with an object named 'tree.' The object 'tree' consists of a data frame describing the nodes in the mutation graph (specifically directed acyclic graph, or DAG) for this tumor.
tree <- tumor@tree tree.df <- tree$tree.df
We can use igraph to plot the clonal tree for the tumor generated above, where each clone is also labelled with the gene in which the mutation occurred that distinguined the clone from its parent.
plot(tree$tree.obj, layout = layout.reingold.tilford)
Plotting the variant allele fraction data, without passengers:
seqdf <- data$seq.data somatic <- seqdf[seqdf$status=='somatic',] plot(as.numeric(as.character(somatic$VAF)), pch=16, xlab='Mutation Index',ylab='VAF')
We can also generate a population:
N <- 5 ks <- sample(2:6,N,replace=TRUE) rs <- ks + sample(0:5,N,replace=TRUE) psiset <- lapply(1:N,function(i){as.vector(rdirichlet(1,alpha=rep(1,ks[i])))}) tumors <- lapply(1:N,function(i){ Tumor(psi=psiset[[i]], rounds=rs[i], pcnv=0, nu=2, mutationProbs=mutationProbs, labelSize=.5) }) somaticDataSet <- lapply(1:length(tumors),function(i){ tumorDataSet <- generateTumorData(tumors[[i]],snps.seq=0,snps.ary=NA,mu=100,sigma.reads=10,sigma0.lrr=.15, sigma0.baf=.03,density.sigma=.1) seqdf <- tumorDataSet$seq.data seqdf[seqdf$status=='somatic',] })
Plotting the resultant tumor DAGs:
plot(tumors[[1]]@tree$tree.obj, layout = layout.reingold.tilford) plot(tumors[[2]]@tree$tree.obj, layout = layout.reingold.tilford) plot(tumors[[3]]@tree$tree.obj, layout = layout.reingold.tilford) plot(tumors[[4]]@tree$tree.obj, layout = layout.reingold.tilford) plot(tumors[[5]]@tree$tree.obj, layout = layout.reingold.tilford)
And the simulated data VAFs (now with some passengers):
plot(as.numeric(as.character(somaticDataSet[[1]]$VAF)),xlab='Mutation Index',ylab='VAF',pch=16) plot(as.numeric(as.character(somaticDataSet[[2]]$VAF)),xlab='Mutation Index',ylab='VAF',pch=16) plot(as.numeric(as.character(somaticDataSet[[3]]$VAF)),xlab='Mutation Index',ylab='VAF',pch=16) plot(as.numeric(as.character(somaticDataSet[[4]]$VAF)),xlab='Mutation Index',ylab='VAF',pch=16) plot(as.numeric(as.character(somaticDataSet[[5]]$VAF)),xlab='Mutation Index',ylab='VAF',pch=16)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.