knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

1. Installation

Install all packages in the latest version of R.

devtools::install_github("zhouzilu/DENDRO")

The reference dataset from Deng et al. [@Deng2014] is pre-stored in the DENDRO package.

2. Questions & issues

If you have any questions or problems when using DENDROplan, please feel free to open a new issue here. You can also email the maintainers of the package -- the contact information is below.

3. DENDROplan pipeline

3.1 Overall pipeline

Figure 2 illustrate the overall pipeline. The analysis starts with a designed tree with an interested clade (purple clade in the example). Based on the tree model, number of cells, sequencing depth and sequencing error rate, we simulate single cell mutation profile. scRNA data was sampled from a reference scRNA-seq dataset given expression level in bulk. A phylogeny computed by DENDRO is further compared with underlining truth, which measured by three statistics - adjusted Rand index (global accuracy statistics), capture rate (subclone specific statistic) and purity (subclone specific statistic).

knitr::include_graphics("https://raw.githubusercontent.com/zhouzilu/DENDRO/master/figure/Pkg_FIG-02.jpg")

Figure 2. A flowchart outlining the procedures for DENDROplan.

3.2 Load reference

Our reference dataset is from Deng et al. [@Deng2014] with great sequencing quality and depth. First, let's load DENDROplan_ref and check the structure of the reference variable ref

library(DENDRO)
data("DENDROplan_ref")
str(ref)

There are total 22958 potential mutation loci across 133 cells. As a result, when we simulate trees, the maximum number of cells is 133 and the maximum number of mutations is 22958.

3.3 Tree simulation

3.3.1 Tree type and parameters

The trees that DENDROplan is able to simulate show at Figure 3. The corresponding k and subtype value are also included.

knitr::include_graphics("https://raw.githubusercontent.com/zhouzilu/DENDRO/master/figure/Pkg_FIG-03.jpg")

Figure 3. Example tree structure DENDRO can simulate with corresponding parameters.

Numbers on branches illustrate index of mutation probability variable lprob, and numbers at node illustrate index of cell proportion variable kprob. The interested clade is labeled by a star. Capture rate and purity are measured on that specific star cluster. k is the parameter indicating number of subclones. When k=4, the subclones can form two types of trees, differetiated by the subtype parameter.

3.3.2 Accuracy measurements

We measure DENDROplan results by four statistics, explained here:

3.4 Practice

3.4.1 Exampe I

Assume we only have 100 mutation sites and our tree looks like Figure 3c with mutations and cells uniformly distributed

res=DENDRO.simulation(n=100,ref=ref,k=4,subtype=1)
res

Result shows the example tree structure as well as statistics with 95% Confidence Interval (CI).

3.4.2 Example II

In the 2nd example, we want to specify the similar tree but with customized cell proportion and mutation loads. We want the four clusters have cell proportion 0.2, 0.2, 0.2 and 0.4, i.e. we want one major subclone. In our code we need to specify this by parameter kprob. Also, we want our mutation load unequally distributed with branch 4 (cluster 4 specific) having more mutations lprob=c(0.15,0.15,0.15,0.25,0.15,0.15).

res=DENDRO.simulation(kprob=c(0.2,0.2,0.2,0.4),lprob=c(0.15,0.15,0.15,0.25,0.15,0.15),n=100,ref=ref,k=4,subtype=1)
res

Example tree shows consistency with our input.

3.4.3 Example III

One important reason for user to use this tool is that it can estimate the cluster accuracy given different sequencing depth. In the original Deng et al. paper [@Deng2014], they collect around 10,000,000 50bp-reads per cell in mice, resulting 46 mapped reads for each mutation site, which is super high depth (Figure 4).

knitr::include_graphics("https://raw.githubusercontent.com/zhouzilu/DENDRO/master/figure/deng_stat-04.jpg")

Figure 4. scRNA-seq library statistics for Deng et al. [@Deng2014]. Modified from original paper.

In real life, usually 1,000,000 reads per cell is pretty good depth. Proportionally, there are around 4.5 mapped reads per mutation site. Let's change the read depth parameter RD and see how well DENDRO performs.

res=DENDRO.simulation(RD=4.5,n=100,ref=ref,k=4,subtype=1)
res

Feel free to play with different parameters. The detailed function document can be found by typing ??DENDRO.simulation in R.

4. Session info

sessionInfo()

5. References



zhouzilu/DENDRO documentation built on May 21, 2020, 8 p.m.