ITH_optim: ITH_optim

View source: R/SMASH.R

ITH_optimR Documentation

ITH_optim

Description

Performs EM algorithm for a given configuration matrix

Usage

ITH_optim(
  my_data,
  my_purity,
  init_eS,
  pi_eps0 = NULL,
  my_unc_q = NULL,
  max_iter = 4000,
  my_epsilon = 1e-06
)

Arguments

my_data

A R dataframe containing the following columns:

tAD

tumor alternate read counts

tRD

tumor reference read counts

CN_1

minor allele count

CN_2

major allele count, where CN_1 <= CN_2

tCN

CN_1 + CN_2

my_purity

A single numeric value of known/estimated purity

init_eS

A subclone configuration matrix pre-defined in R list eS

pi_eps0

A user-specified parameter denoting the proportion of loci not explained by the combinations of purity, copy number, multiplicity, and allocation. If NULL, it is initialized at 1e-3. If set to 0.0, the parameter is not estimated.

my_unc_q

An optimal initial vector for the unconstrained q vector, useful after running grid_ITH_optim. If this variable is NULL, then the subclone proportions, q, are randomly initialized. For instance, if my_unc_q = ( x1 , x2 ), then q = ( exp(x1) / (1 + exp(x1) + exp(x2)) , exp(x2) / (1 + exp(x1) + exp(x2)) , 1 / (1 + exp(x1) + exp(x2)).

max_iter

Positive integer, preferably 1000 or more, setting the maximum number of iterations

my_epsilon

Convergence criterion threshold for changes in the log likelihood, preferably 1e-6 or smaller

Value

If the EM algorithm converges, the output will be a list containing

iter

number of iterations

converge

convergence status

unc_q0

initial unconstrained subclone proportions parameter

unc_q

unconstrained estimate of q

q

estimated subclone proportions among cancer cells

CN_MA_pi

estimated mixture probabilities of multiplicities and allocations given copy number states

eta

estimated subclone proportion among tumor cells

purity

user-inputted tumor purity

entropy

estimated entropy

infer

A R dataframe containing inferred variant allocations (infer_A), multiplicities (infer_M), cellular prevalences (infer_CP).

ms

model size, number of parameters within parameter space

LL

The observed log likelihood evaluated at maximum likelihood estimates.

AIC = 2 * LL - 2 * ms

Negative AIC, used for model selection

BIC = 2 * LL - ms * log(LOCI)

Negative BIC, used for model selection

LOCI

The number of inputted somatic variants.


Sun-lab/SMASH documentation built on March 5, 2025, 8 p.m.