inferHeterogeneityPlus: Clusters variants and Estimating Tumor Heterogeneity(TH)...

inferHeterogeneityPlusR Documentation

Clusters variants and Estimating Tumor Heterogeneity(TH) based on Variant Allele Frequencies (VAF).

Description

takes output generated by read.maf and clusters variants to infer tumor heterogeneity. This function requires VAF for clustering and density estimation. VAF can be on the scale 0-1 or 0-100. Optionally if copy number information is available, it can be provided as a segmented file (e.g, from Circular Binary Segmentation). Those variants in copy number altered regions will be ignored.

Usage

inferHeterogeneityPlus(
  maf,
  tsb = NULL,
  index = "diversity",
  top = 5,
  vafCol = NULL,
  segFile = NULL,
  ignChr = NULL,
  minVaf = 0,
  maxVaf = 1,
  useSyn = FALSE,
  dirichlet = FALSE,
  bin_size = 10
)

Arguments

maf

an MAF object generated by read.maf

tsb

specify sample names (Tumor_Sample_Barcodes) for which clustering has to be done.

index

the available methods for the diversity indices estimate. The parames should be "diversity" or "taxonomic", see Details for more information.

top

if tsb is NULL, uses top n number of most mutated samples. Defaults to 5.

vafCol

manually specify column name for vafs. Default looks for column 't_vaf'

segFile

path to CBS segmented copy number file. Column names should be Sample, Chromosome, Start, End, Num_Probes and Segment_Mean (log2 scale).

ignChr

ignore these chromosomes from analysis. e.g, sex chromsomes chrX, chrY. Default NULL.

minVaf

filter low frequency variants. Low vaf variants maybe due to sequencing error. Default 0. (on the scale of 0 to 1)

maxVaf

filter high frequency variants. High vaf variants maybe due to copy number alterations or impure tumor. Default 1. (on the scale of 0 to 1)

useSyn

Use synonymous variants. Default FALSE.

dirichlet

Deprecated! No longer supported. uses nonparametric dirichlet process for clustering. Default FALSE - uses finite mixture models.

bin_size

divide the vaf into N(=10 default) bins.

Details

This function clusters variants based on VAF to estimate univariate density and cluster classification. There are two methods available for clustering. Default using parametric finite mixture models and another method using nonparametric inifinite mixture models (Dirichlet process).

Estimate the TH Indices

The TH indices are based on diveristy indices or the taxonomic diveristy, see vegan for details.

When index = "diveristy", the shannon and reverse simpson index are esimated.

When index = "taxonomic", the taxonomic diveristy(Delt) and taxonomic distinctness(Dstar) are estimated.

Value

list of clustering tables, including:

clusterData: data of clustering and TH indices.

clusterMeans: means of clustering.

diveristy: summarize of TH indices.

Author(s)

Qingjian Chen modified the code of inferHeterogeneity and added the TH index. Email:chenqingjian2010@163.com

References

Chris Fraley and Adrian E. Raftery (2002) Model-based Clustering, Discriminant Analysis and Density Estimation Journal of the American Statistical Association 97:611-631

Jara A, Hanson TE, Quintana FA, Muller P, Rosner GL. DPpackage: Bayesian Semi- and Nonparametric Modeling in R. Journal of statistical software. 2011;40(5):1-30.

Olshen AB, Venkatraman ES, Lucito R, Wigler M. Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics. 2004;5(4):557-72.

See Also

plotClusters, inferHeterogeneity

Examples

## Not run: 

library(THindex)
library(maftools)
laml.maf <- system.file("extdata", "tcga_laml.maf.gz", package = "maftools")
laml <- read.maf(maf = laml.maf)
TCGA.ab.het <- inferHeterogeneityPlus(maf = laml, vafCol = 'i_TumorVAF_WU', index = "diversity")
print(TCGA.ab.het$diveristy)


## End(Not run)

qingjian1991/THindex documentation built on July 14, 2024, 10:33 a.m.