In sohrabsa/ddclone: Joint statistical inference of clonal populations from single cell and bulk tumour sequencing data

ddClone: Joint statistical inference of clonal populations from single cell and bulk tumour sequencing data

A statistical framework leveraging data obtained from both single cell and bulk sequencing strategies. The ddClone [@ddclone] approach is predicated on the notion that single cell sequencing data will inform and improve clustering of allele fractions derived from bulk sequencing data in a joint statistical model.
ddClone combines a Bayesian non-parametric prior informed by single cell data with a likelihood model based on bulk sequencing data to infer clonal population architecture. Intuitively, the prior encourages genomic loci with co-occurring mutations in single cells to cluster together. Using a cell-locus binary matrix from single cell sequencing, ddClone computes a distance matrix between mutations using the Jaccard distance with exponential decay. This matrix is then used as a prior for inference over mutation clusters and their prevalences from deeply sequenced bulk data in a distance-dependent Chinese restaurant process [@ddcrp] framework. The output of the model is the most probable set of mutational clusters present and the prevalence of each mutation in the population. The code is based on the ddCRP model, as introduced and implemented in [@ddcrp].

Install the package

An easy way to install ddclone is as follows:

#library('devtools')
#install_github('sohrabsa/ddClone')

A simple example

1. Simulated Data

Load the library:

library(ddclone)

Run ddClone over simulated data:

data(dollo.10.48.0.f0.gl0)
ddCloneRes <- ddclone(dataObj = dollo.10.48.0.f0.gl0,
              outputPath = './output/dollo.0/', tumourContent = 1.0,
              numOfIterations = 10, thinning = 1, burnIn = 1,
              seed = 1)

Display the result:

df <- ddCloneRes$df
expPath <- ddCloneRes$expPath

Evaluate against the gold standard:

data(dollo.10.48.0.f0.gl0)
nMut <- length(dollo.10.48.0.f0.gl0$mutPrevalence)
goldStandard <- data.frame(mutID = 1:nMut,
                           clusterID = relabel.clusters(as.vector(dollo.10.48.0.f0.gl0$mutPrevalence)),
                           phi = as.vector(dollo.10.48.0.f0.gl0$mutPrevalence))

Note that in this example the data was packaged in such a way that it contained the gold standard.

Evaluate clustering:

(clustScore <- evaluate.clustering(goldStandard$clusterID, df$clusterID))

Evaluate prevalence estimates:

(phiScore <- mean(abs(goldStandard$phi - df$phi)))

Save the result:

score <- data.frame(clustScore, phiMeanError = phiScore)
write.table(score, file.path(expPath, 'result-scores.csv'))

2. Create a ddclone input object

ddClone's input object is a list of 3 elements, mutCounts, psi, and filteredMutMatrix. We use the simulated data from the Generalized Dollo model:

require(xlsx)
intputFilePath <- system.file("extdata", "inputs_simulated.xlsx", package = "ddclone")

Read the genotype-mutation matrix:

genDat <- read.xlsx(file = intputFilePath, sheetName = 'seed1_genotypes', row.names = T)
genDatMutList <- colnames(genDat)

Read the bulk data:

bulkDat <- read.xlsx(file = intputFilePath, sheetName = 'seed_1_allele_counts', row.names = T)
bulkMutList <- as.vector(bulkDat$mutation_id)
rownames(bulkDat) <- bulkMutList

Generate the ddClone compatible data object:

ddCloneInputObj <- make.ddclone.input(bulkDat = bulkDat, genDat = genDat, outputPath = './output/dollo.0/', nameTag = '')

Inspect the data object:

str(ddCloneInputObj, max.level = 1)

Now we can run the analysis similar to sample 1 above.

ddCloneRes <- ddclone(dataObj = ddCloneInputObj,
              outputPath = './output/dollo.0/', tumourContent = 1.0,
              numOfIterations = 10, thinning = 1, burnIn = 1,
              seed = 1)

References

sohrabsa/ddclone documentation built on May 30, 2019, 6:08 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

sohrabsa/ddclone
Joint statistical inference of clonal populations from single cell and bulk tumour sequencing data

In sohrabsa/ddclone: Joint statistical inference of clonal populations from single cell and bulk tumour sequencing data

ddClone: Joint statistical inference of clonal populations from single cell and bulk tumour sequencing data

Install the package

A simple example

1. Simulated Data

2. Create a ddclone input object

References

R Package Documentation

Browse R Packages

We want your feedback!

sohrabsa/ddclone Joint statistical inference of clonal populations from single cell and bulk tumour sequencing data

In sohrabsa/ddclone: Joint statistical inference of clonal populations from single cell and bulk tumour sequencing data

ddClone: Joint statistical inference of clonal populations from single cell and bulk tumour sequencing data

Install the package

A simple example

1. Simulated Data

2. Create a ddclone input object

References

R Package Documentation

Browse R Packages

We want your feedback!

sohrabsa/ddclone
Joint statistical inference of clonal populations from single cell and bulk tumour sequencing data