ITH_optim: ITH_optim
In Sun-lab/SMASH: Subclone Multiplicity Allocation and Somatic Heterogeneity

ITH_optim

R Documentation

ITH_optim

Performs EM algorithm for a given configuration matrix

ITH_optim(
  my_data,
  my_purity,
  init_eS,
  pi_eps0 = NULL,
  my_unc_q = NULL,
  max_iter = 4000,
  my_epsilon = 1e-06
)

`my_data`	A R dataframe containing the following columns: `tAD` tumor alternate read counts `tRD` tumor reference read counts `CN_1` minor allele count `CN_2` major allele count, where `CN_1 <= CN_2` `tCN` `CN_1 + CN_2`
`my_purity`	A single numeric value of known/estimated purity
`init_eS`	A subclone configuration matrix pre-defined in R list `eS`
`pi_eps0`	A user-specified parameter denoting the proportion of loci not explained by the combinations of purity, copy number, multiplicity, and allocation. If `NULL`, it is initialized at 1e-3. If set to 0.0, the parameter is not estimated.
`my_unc_q`	An optimal initial vector for the unconstrained `q` vector, useful after running `grid_ITH_optim`. If this variable is `NULL`, then the subclone proportions, `q`, are randomly initialized. For instance, if `my_unc_q = ( x1 , x2 )`, then `q = ( exp(x1) / (1 + exp(x1) + exp(x2)) , exp(x2) / (1 + exp(x1) + exp(x2)) , 1 / (1 + exp(x1) + exp(x2))`.
`max_iter`	Positive integer, preferably 1000 or more, setting the maximum number of iterations
`my_epsilon`	Convergence criterion threshold for changes in the log likelihood, preferably 1e-6 or smaller

If the EM algorithm converges, the output will be a list containing

iter: number of iterations
converge: convergence status
unc_q0: initial unconstrained subclone proportions parameter
unc_q: unconstrained estimate of q
q: estimated subclone proportions among cancer cells
CN_MA_pi: estimated mixture probabilities of multiplicities and allocations given copy number states
eta: estimated subclone proportion among tumor cells
purity: user-inputted tumor purity
entropy: estimated entropy
infer: A R dataframe containing inferred variant allocations (infer_A), multiplicities (infer_M), cellular prevalences (infer_CP).
ms: model size, number of parameters within parameter space
LL: The observed log likelihood evaluated at maximum likelihood estimates.
AIC = 2 * LL - 2 * ms: Negative AIC, used for model selection
BIC = 2 * LL - ms * log(LOCI): Negative BIC, used for model selection
LOCI: The number of inputted somatic variants.