Bayesian Estimation of Tumor Purity and Clonality

Share:

Description

PurBayes is an iterative Bayesian algorithm which simultaneously estimates tumor purity and clonality using finite mixture models, using the MCMC software JAGS to obtain posterior samples for inference. Using a penalized deviance criterion, PurBayes iteratively fits models increasing in variant population count until an optimal fit is achieved.

Usage

1
2
PurBayes(N, Y, M=NULL, Z=NULL, pop.max=5, prior=NULL, burn.in=50000,
 n.post = 10000, fn.jags = "PB.jags", plot = FALSE)

Arguments

N

numeric vector of total reads for each somatic mutation from the tumor tissue NGS data

Y

numeric vector of mutant allele supporting read counts for each somatic mutation from the tumor tissue NGS data

M

optional numeric vector of total reads for germline heterogyous variants. PurBayes uses these to estimate non-reference allele mapping rate to account for mapping bias

Z

optional numeric vector of alternate allele reads for germline heterozygous variants, corresponding to M

pop.max

Maximum number of variant populations allowed in the iterative modeling procedure. Defaults to 5.

prior

Optional prior distribution for λ_J under the homogenenous tumor model. If NULL, defaults to Uniform(0,1). WARNING: This must be provided as a character string written within the JAGS modeling language.

burn.in

Number of MCMC draws that are excluded as a burn-in. Defaults to 50000.

n.post

Number of MCMC draws that are sampled for posterior inference. Defaults to 10000.

fn.jags

File location and name to which write.PB generates the appropriate JAGS model file. Defaults to 'PB.jags' in the current working directory.

plot

If plot=TRUE, then plot.PurBayes is called to generate a visual representation of the data along with the model fit by PurBayes. Defaults to FALSE.

Details

For a given tumor purity level λ PurBayes assumes a binomial-binomial mixture model for the tumor sequence reads which support the alternate allele, Y_i^t \sim Bin(N_i,λ/2). This model is fit to the data under the assumption of tumor homogeneity. PurBayes also supports the possibility of intra-tumor heterogeneity, whereby the tumor tissue is comprised of additional subclonal variant populations, each with its own 'purity', λ_j<λ, for j = 1,...,J-1 and λ_J \equiv λ.

The probability that a given variant corresponds to the j^{th} population is given by κ_j, and \bm{κ}=(κ_1,…,κ_J) follows a dirichlet prior such that π(\bm{κ})\sim Dirichlet(α_1,\,…,α_J) for a given variant population quantity J. PurBayes applies a diffuse prior on \bm{κ}, such that α_1=…=α_J=1. While the user may specify a particular prior for λ under a homogeneous tumor, PurBayes defaults to π(λ_j) \sim Uniform(0,1) for all j, and uses a sort function to avoid label switching.

The optimality criterion used for model selection with regard to size of J is based upon the penalized expected deviance (Plummer, 2008) In instances where the optimism cannot be determined, it is approximated by twice the pD value (along with a warning this approximation is being used).

Value

List object of designated class PurBayes, which includes data inputs N,Y,M,Z, as well as:

n.pop

Numeric scalar corresponding to number of variant populations detected by PurBayes

PB.post

mcmc.list object corresponding to posterior samples of PurBayes model parameters. This necessarily includes pur, the tumor purity. If n.pop>1, posterior samples of κ_j and λ_j for j = 1,...,J are also included.

dev.mat

a matrix of the penalized expected deviance results from the model selection procedure. This includes the penalized expected deviance, the difference in PED with the reference model, and the standard error of that difference.

which.ref

indicates which fitted model is the reference model in the penalized expected deviance analysis. This will either be the fitted model with the minimal PED.

jag.fits

List of learned JAGS models (object class jags) fit in the model selection process

Author(s)

Nicholas B. Larson

References

Plummer, M. (2008) Penalized loss functions for Bayesian model comparison. Biostatistics doi: 10.1093/biostatistics/kxm049

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
#Homogeneous tumor example
N.var<-20
N<-round(runif(N.var,20,200))
lambda<-0.75
Y<-rbinom(N.var,N,lambda/2)
## Not run: PB.hom<-PurBayes(N,Y)

#Heterogeneous tumor example - 1 subclonal population
N.var<-20
N<-round(runif(N.var,20,200))
lambda.1<-0.75
lambda.2<-0.25
lambda<-c(rep(lambda.1,10),rep(lambda.2,10))
Y<-rbinom(N.var,N,lambda/2)
## Not run: PB.het<-PurBayes(N,Y)