knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
In this package, we use the ecology methods to estimate the Tumor Heterogeneity(TH) based on their mutated loci of variant allele frequency(VAF).
The function inferHeterogeneityPlus
estimates the TH based on two different methods in the Package vegan
Diveristy indices
Taxonomic indices
See also http://www.coastalwiki.org/wiki/Measurements_of_biodiversity for the concepts.
Function diversity finds the most commonly used diversity indices.
$H=-\sum_{i=1}^{S}~p_ilogp_i$ Shannon Index (1)
$D=\frac{1}{\sum_{i=1}^{S}~p_i^2} \$ Inverse Simpson Index(2)
Where $p_i$ is the proportion of species $i$ and $S$ is the number of species. For the tumor data, the VAFs of mutated loci in the tumor were assigned to i-th of S bins, and the parameter $p_i$ the proportion of mutated loci belonging to the bins. Here, we set the bin size to 10 (Parameter bin_size controls the number of bins), yielding enough information to represent the distribution for proprotions of VAFs.
The simple diveristy above only consider species identity: all species are euqally different. In contrast, taxonimic diveristy indices judge the differences of species.
$\Delta=\frac{\sum\sum_{i<j}~~\omega_ijX_iX_j}{n(n-1)/2}$ Taxonomic diveristy (3)
$\Delta^*=\frac{\sum\sum_{i<j}~~\omega_ijX_iX_j}{\sum\sum_{i<j}~~X_iX_j}$ Taxonomic distinctness (4)
These equations give the index values for Taxonomic difference, and summation goes over species $i$ and $j$, and $\omega$ are the taxonomic distances among taxa, $X$ are species abundances, and $n$ is the total abundance for a site.
For the tumor data, the distance of adjacent bins is set 1. For example, if the bins are set 5, then the distance between bin #1 and #5 is 4. If the numbers of occurrences for the 5 bins are (2,4,0,4,2) or (0,2,4,4,3), the former one has higher Taxonomic diversity than the latter one.
We use the function inferHeterogeneityPlus
in the THindex to estimate the Tumor Heterogeneity.
library(THindex) library(maftools) #read maf data. The read.maf is functions of 'maftools'. laml.maf <- system.file("extdata", "tcga_laml.maf.gz", package = "maftools") laml <- read.maf(maf = laml.maf)
The function inferHeterogeneityPlus
is modified from the function inferHeterogeneity
of maftools. The input parameters are largely overlapped between the two functions.
The parameter index
of inferHeterogeneityPlus
controls the functions of TH index. If index = "diveristy"
, shannon and inverse Simpson indices are calculated(Eqs 1 and 2).
TCGA.ab.het <- inferHeterogeneityPlus(maf = laml, vafCol = 'i_TumorVAF_WU', index = "diversity") knitr::kable(TCGA.ab.het$diveristy)
If index = "taxonomic"
, Taxonomic diversity and Taxonomic distinctness are calculated (Eqs 3 and Eqs 4).
TCGA.ab.het1 <- inferHeterogeneityPlus(maf = laml, vafCol = 'i_TumorVAF_WU', index = "taxonomic") knitr::kable(TCGA.ab.het1$diveristy)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.