inferHeterogeneityPlus: Clusters variants and Estimating Tumor Heterogeneity(TH)...
In qingjian1991/THindex: Tumor Heterogeneity Index Estimating

inferHeterogeneityPlus

R Documentation

Clusters variants and Estimating Tumor Heterogeneity(TH) based on Variant Allele Frequencies (VAF).

Description

takes output generated by read.maf and clusters variants to infer tumor heterogeneity. This function requires VAF for clustering and density estimation. VAF can be on the scale 0-1 or 0-100. Optionally if copy number information is available, it can be provided as a segmented file (e.g, from Circular Binary Segmentation). Those variants in copy number altered regions will be ignored.

Usage

inferHeterogeneityPlus(
  maf,
  tsb = NULL,
  index = "diversity",
  top = 5,
  vafCol = NULL,
  segFile = NULL,
  ignChr = NULL,
  minVaf = 0,
  maxVaf = 1,
  useSyn = FALSE,
  dirichlet = FALSE,
  bin_size = 10
)

Arguments

`maf`	an `MAF` object generated by `read.maf`
`tsb`	specify sample names (Tumor_Sample_Barcodes) for which clustering has to be done.
`index`	the available methods for the diversity indices estimate. The parames should be "diversity" or "taxonomic", see Details for more information.
`top`	if `tsb` is NULL, uses top n number of most mutated samples. Defaults to 5.
`vafCol`	manually specify column name for vafs. Default looks for column 't_vaf'
`segFile`	path to CBS segmented copy number file. Column names should be Sample, Chromosome, Start, End, Num_Probes and Segment_Mean (log2 scale).
`ignChr`	ignore these chromosomes from analysis. e.g, sex chromsomes chrX, chrY. Default NULL.
`minVaf`	filter low frequency variants. Low vaf variants maybe due to sequencing error. Default 0. (on the scale of 0 to 1)
`maxVaf`	filter high frequency variants. High vaf variants maybe due to copy number alterations or impure tumor. Default 1. (on the scale of 0 to 1)
`useSyn`	Use synonymous variants. Default FALSE.
`dirichlet`	Deprecated! No longer supported. uses nonparametric dirichlet process for clustering. Default FALSE - uses finite mixture models.
`bin_size`	divide the vaf into N(=10 default) bins.

Details

This function clusters variants based on VAF to estimate univariate density and cluster classification. There are two methods available for clustering. Default using parametric finite mixture models and another method using nonparametric inifinite mixture models (Dirichlet process).

Estimate the TH Indices

The TH indices are based on diveristy indices or the taxonomic diveristy, see vegan for details.

When index = "diveristy", the shannon and reverse simpson index are esimated.

When index = "taxonomic", the taxonomic diveristy(Delt) and taxonomic distinctness(Dstar) are estimated.

Value

list of clustering tables, including:

clusterData: data of clustering and TH indices.

clusterMeans: means of clustering.

diveristy: summarize of TH indices.

Author(s)

Qingjian Chen modified the code of inferHeterogeneity and added the TH index. Email:chenqingjian2010@163.com

References

Chris Fraley and Adrian E. Raftery (2002) Model-based Clustering, Discriminant Analysis and Density Estimation Journal of the American Statistical Association 97:611-631

Jara A, Hanson TE, Quintana FA, Muller P, Rosner GL. DPpackage: Bayesian Semi- and Nonparametric Modeling in R. Journal of statistical software. 2011;40(5):1-30.

Olshen AB, Venkatraman ES, Lucito R, Wigler M. Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics. 2004;5(4):557-72.

Examples

## Not run: 

library(THindex)
library(maftools)
laml.maf <- system.file("extdata", "tcga_laml.maf.gz", package = "maftools")
laml <- read.maf(maf = laml.maf)
TCGA.ab.het <- inferHeterogeneityPlus(maf = laml, vafCol = 'i_TumorVAF_WU', index = "diversity")
print(TCGA.ab.het$diveristy)


## End(Not run)

qingjian1991/THindex documentation built on July 14, 2024, 10:33 a.m.