Proof of concept: filtering pipeline (Clustering quality) "

Creating data

This script is to show the importance of clustering high confidence variants and then attribute meaningful ones to the identified clusters

# Loading libraries
  devtools::install_github(repo = "DeveauP/QuantumClone")
if(!require(knitr)) install.packages("knitr")
if(!require(ggplot2)) install.packages("ggplot2")
if(!require(reshape2)) install.packages("reshape2")

# Creating reproducible example


*All functions are stored inside the reproduce_2.R, to avoid long display of codes. Below are the values that will be used throughout the testing *

number_iterations <- 10
number_mutations <- 200
ndrivers <- 20

We will first create a test set with QuantumCat with 6 clones, r number_mutations variants, diploid, with an average depth of 100X, two samples with respective purity 70% and 60%. We make sure these variants correspond to stringent filters (i.e. depth $> 50$X).<-QuantumCat_stringent(number_of_clones = 6,number_of_mutations = number_mutations,
                     ploidy = "AB",depth = 100,
                     contamination = c(0.3,0.4),min_depth = 50)

We check that all these variants are within the stringent filters (i.e depth $\geq 50$ X), and display the first six rows of the first sample:

sum([[1]]$Depth<50 |[[2]]$Depth<50)

Then we create r number_mutations mutations that are in permissive filters. For that we take r round(number_mutations/4) mutations with 30 to 50 depth, r round(number_mutations/2) that have a depth $\geq 30$ in triploid (AAB) loci and r round(number_mutations/4) that have a depth $\geq 30$ in a tetraploid (AABB) locus.

permissive<-QuantumCat_permissive(fromQuantumCat = ,number_of_mutations = number_mutations,
                               ploidy = "AB",depth = 100,
                               contamination = c(0.3,0.4),max_depth = 50, min_depth = 30)

We are now going to select r ndrivers drivers, with probability $3/4$ of being in the permissive filters.

drivers_id<-sample(1:(2*number_mutations),size = ndrivers,prob = rep(c(1/{4*number_mutations},
                                                                     each = number_mutations)

We now want to cluster mutations using only the filtered mutations (Paper pipeline), the filtered and drivers (extended), or all mutations alltogether (All), and compare the clustering quality of these different methods.

ext<-extended(filtered =,
              permissive = permissive,
              drivers_id = drivers_id)

all<-All(filtered =,
         permissive = permissive,
         drivers_id = drivers_id

pap<-paper_pipeline(filtered =,
                    permissive = permissive,
                    drivers_id = drivers_id)

We are now going to compare the quality of clustering using the Normalized Mutual Information, the number of clusters found (the truth being 6), the maximal and average error in the distance of a driver to its real position. N.B:

Quality<-compare_qual(paper = pap,
                      extended = ext,
                      all = all,
                      drivers_id = drivers_id)


We are now going to reproduce this test r number_iterations-1 times.


We can plot these results:

Melt<-melt(cbind(Quality),id = "Pipeline")

ggplot(Melt,aes_string(x= "Pipeline",y = "value",facet = "variable"))+geom_boxplot()+facet_wrap( ~ variable,nrow = floor(sqrt(length(unique(Melt$variable))))+1,scales = "free_y")+theme_bw()+theme(axis.text.x = element_text(angle = 90))

Try the QuantumClone package in your browser

Any scripts or data that you put into this service are public.

QuantumClone documentation built on May 2, 2019, 3:03 a.m.