Bayesian Estimation of Tumor Purity and Clonality
Description
PurBayes is an iterative Bayesian algorithm which simultaneously estimates tumor purity and clonality using finite mixture models, using the MCMC software JAGS to obtain posterior samples for inference. Using a penalized deviance criterion, PurBayes iteratively fits models increasing in variant population count until an optimal fit is achieved.
Usage
1 2 
Arguments
N 
numeric vector of total reads for each somatic mutation from the tumor tissue NGS data 
Y 
numeric vector of mutant allele supporting read counts for each somatic mutation from the tumor tissue NGS data 
M 
optional numeric vector of total reads for germline heterogyous variants. PurBayes uses these to estimate nonreference allele mapping rate to account for mapping bias 
Z 
optional numeric vector of alternate allele reads for germline heterozygous variants, corresponding to 
pop.max 
Maximum number of variant populations allowed in the iterative modeling procedure. Defaults to 5. 
prior 
Optional prior distribution for λ_J under the homogenenous tumor model. If 
burn.in 
Number of MCMC draws that are excluded as a burnin. Defaults to 50000. 
n.post 
Number of MCMC draws that are sampled for posterior inference. Defaults to 10000. 
fn.jags 
File location and name to which 
plot 
If 
Details
For a given tumor purity level λ PurBayes assumes a binomialbinomial mixture model for the tumor sequence reads which support the alternate allele, Y_i^t \sim Bin(N_i,λ/2). This model is fit to the data under the assumption of tumor homogeneity. PurBayes also supports the possibility of intratumor heterogeneity, whereby the tumor tissue is comprised of additional subclonal variant populations, each with its own 'purity', λ_j<λ, for j = 1,...,J1 and λ_J \equiv λ.
The probability that a given variant corresponds to the j^{th} population is given by κ_j, and \bm{κ}=(κ_1,…,κ_J) follows a dirichlet prior such that π(\bm{κ})\sim Dirichlet(α_1,\,…,α_J) for a given variant population quantity J. PurBayes applies a diffuse prior on \bm{κ}, such that α_1=…=α_J=1. While the user may specify a particular prior for λ under a homogeneous tumor, PurBayes defaults to π(λ_j) \sim Uniform(0,1) for all j, and uses a sort function to avoid label switching.
The optimality criterion used for model selection with regard to size of J is based upon the penalized expected deviance (Plummer, 2008) In instances where the optimism cannot be determined, it is approximated by twice the pD value (along with a warning this approximation is being used).
Value
List object of designated class PurBayes
, which includes data inputs N
,Y
,M
,Z
, as well as:
n.pop 
Numeric scalar corresponding to number of variant populations detected by PurBayes 
PB.post 

dev.mat 
a matrix of the penalized expected deviance results from the model selection procedure. This includes the penalized expected deviance, the difference in PED with the reference model, and the standard error of that difference. 
which.ref 
indicates which fitted model is the reference model in the penalized expected deviance analysis. This will either be the fitted model with the minimal PED. 
jag.fits 
List of learned JAGS models (object class 
Author(s)
Nicholas B. Larson
References
Plummer, M. (2008) Penalized loss functions for Bayesian model comparison. Biostatistics doi: 10.1093/biostatistics/kxm049
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15  #Homogeneous tumor example
N.var<20
N<round(runif(N.var,20,200))
lambda<0.75
Y<rbinom(N.var,N,lambda/2)
## Not run: PB.hom<PurBayes(N,Y)
#Heterogeneous tumor example  1 subclonal population
N.var<20
N<round(runif(N.var,20,200))
lambda.1<0.75
lambda.2<0.25
lambda<c(rep(lambda.1,10),rep(lambda.2,10))
Y<rbinom(N.var,N,lambda/2)
## Not run: PB.het<PurBayes(N,Y)
