A statistical framework leveraging data obtained from both single cell and bulk sequencing strategies.
The ddClone [@ddclone] approach is predicated on the notion that single cell sequencing
data will inform and improve clustering of allele fractions
derived from bulk sequencing data in a joint statistical model.
ddClone combines a Bayesian non-parametric prior informed by single cell
data with a likelihood model based on bulk sequencing data to infer
clonal population architecture. Intuitively, the prior encourages genomic
loci with co-occurring mutations in single cells to cluster together.
Using a cell-locus binary matrix from single cell sequencing,
ddClone computes a distance matrix between mutations using the Jaccard distance with exponential decay.
This matrix is then used as a prior for inference over mutation clusters and their prevalences from deeply
sequenced bulk data in a distance-dependent Chinese restaurant process [@ddcrp] framework.
The output of the model is the most probable set of mutational clusters present and the
prevalence of each mutation in the population.
The code is based on the ddCRP model, as introduced and implemented in [@ddcrp].
An easy way to install ddclone is as follows:
#library('devtools') #install_github('sohrabsa/ddClone')
Load the library:
library(ddclone)
Run ddClone over simulated data:
data(dollo.10.48.0.f0.gl0) ddCloneRes <- ddclone(dataObj = dollo.10.48.0.f0.gl0, outputPath = './output/dollo.0/', tumourContent = 1.0, numOfIterations = 10, thinning = 1, burnIn = 1, seed = 1)
Display the result:
df <- ddCloneRes$df expPath <- ddCloneRes$expPath
Evaluate against the gold standard:
data(dollo.10.48.0.f0.gl0) nMut <- length(dollo.10.48.0.f0.gl0$mutPrevalence) goldStandard <- data.frame(mutID = 1:nMut, clusterID = relabel.clusters(as.vector(dollo.10.48.0.f0.gl0$mutPrevalence)), phi = as.vector(dollo.10.48.0.f0.gl0$mutPrevalence))
Note that in this example the data was packaged in such a way that it contained the gold standard.
Evaluate clustering:
(clustScore <- evaluate.clustering(goldStandard$clusterID, df$clusterID))
Evaluate prevalence estimates:
(phiScore <- mean(abs(goldStandard$phi - df$phi)))
Save the result:
score <- data.frame(clustScore, phiMeanError = phiScore) write.table(score, file.path(expPath, 'result-scores.csv'))
ddClone's input object is a list of 3 elements, mutCounts
, psi
, and filteredMutMatrix
.
We use the simulated data from the Generalized Dollo model:
require(xlsx) intputFilePath <- system.file("extdata", "inputs_simulated.xlsx", package = "ddclone")
Read the genotype-mutation matrix:
genDat <- read.xlsx(file = intputFilePath, sheetName = 'seed1_genotypes', row.names = T) genDatMutList <- colnames(genDat)
Read the bulk data:
bulkDat <- read.xlsx(file = intputFilePath, sheetName = 'seed_1_allele_counts', row.names = T) bulkMutList <- as.vector(bulkDat$mutation_id) rownames(bulkDat) <- bulkMutList
Generate the ddClone compatible data object:
ddCloneInputObj <- make.ddclone.input(bulkDat = bulkDat, genDat = genDat, outputPath = './output/dollo.0/', nameTag = '')
Inspect the data object:
str(ddCloneInputObj, max.level = 1)
Now we can run the analysis similar to sample 1 above.
ddCloneRes <- ddclone(dataObj = ddCloneInputObj, outputPath = './output/dollo.0/', tumourContent = 1.0, numOfIterations = 10, thinning = 1, burnIn = 1, seed = 1)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.