Clomial: Fits several binomial models to data from multiple samples of...

Description Usage Arguments Details Value Note Author(s) References See Also Examples

View source: R/Clomial.R

Description

Using EM, trains several models using different initial values to escape from local optima. The best one in terms of the likelihood can be later chosen by choose.best() function.

Usage

1
2
3
4
5
Clomial(Dt = NULL, Dc = NULL, DcDtFile = NULL, C, doParal=FALSE,
outPrefix = NULL, binomTryNum = 1000, maxIt = 100, llCutoff = 0.001,
jobNamePrefix = "Bi", qstatWait = 2, fitBinomJobFile = NULL,
jobShare = 10, ignoredSample = c(),  fliProb=0.05, conservative=TRUE,
doTalk=FALSE)

Arguments

Dt

A matrix which contains the counts of the alternative allele where rows correspond to the genomic loci, and columns correspond to the samples.

Dc

A matrix which contains the counts of the total number of mapped reads where rows correspond to the genomic loci, and columns correspond to the samples.

DcDtFile

A file from which the data can optionally be loaded. It should contain the matrices Dc and Dt.

C

The assumed number of clones.

doParal

Boolean where TRUE means, in Linux, models with different initialization are trained in parallel on a cluster using qsub.

outPrefix

A prefix for the path to save the results.

binomTryNum

The number of models trained using different initialization.

maxIt

The maximum number of EM iterations.

llCutoff

EM iterations stops if the relative improvement in the log-likelihood is not more than this threshold.

jobNamePrefix

If run in parallel, this prefix will be used to name the jobs on the cluster.

qstatWait

The waiting time between qstat commands to assess the number of running and waiting jobs.

fitBinomJobFile

If run in parallel, this is the script which loads data, trains a model using a random initialization, and saves the results.

jobShare

If run in parallel, the job_share option of qsub determines the priority of jobs over other submitted jobs.

ignoredSample

A vector of indices of samples which will be ignored in training. Used by experts only to measure the stability of the results.

fliProb

A "flipping probability" used for noise injection which can be disabled when fliProb=0. After the first EM iteration, each entry of the matrix Mu such as m may change to 1-m with this probability. This probability decreases on subsequent iterations.

conservative

Boolean where TRUE means noise will be injected only if likelihood is improved after an EM iteration, otherwise the original Mu matrix will be used for the next iteration. For expert use only.

doTalk

If TRUE, information on the EM optimization iterations is reported.

Details

The likelihood of the model, given the hidden variables and the parameters, can be computed based on a combination of binomial distributions. In each EM iteration, the likelihood is increased, however, due to presence of local optima, several models should be tried using different random initialization. For higher number of assumed clones, C, the parameter binomTryNum should be increased because the dimension of the search space grows linearly with C.

Value

Returns a list containing the entry called models, which is a list of the length equal to binomTryNum where each element is a trained model. For each trained model, Mu models the matrix of genotypes, where rows and columns correspond to genomic loci and clones, accordingly. Also, P is the matrix of clonal frequency where rows and columns correspond to clones and samples, accordingly. The first column of P corresponds to the normal clone. The history of Mu, P, and the log-likelihood over iterations is saved in lists Ps, Mus, and Likelihoods, accordingly.

Note

The parallel mode works only in Linux, and when qsub and qstat commands are available on a cluster.

Author(s)

Habil Zare

References

Inferring clonal composition from multiple sections of a breast cancer, Zare et al., Submitted.

See Also

Clomial, choose.best, Clomial.iterate, compute.bic, breastCancer

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
set.seed(1)
data(breastCancer)
Dc <- breastCancer$Dc
Dt <- breastCancer$Dt
ClomialResult <-Clomial(Dc=Dc,Dt=Dt,maxIt=20,C=4,binomTryNum=2)
chosen <- choose.best(models=ClomialResult$models)
M1 <- chosen$bestModel
print("Genotypes:")
round(M1$Mu)
print("Clone frequencies:")
M1$P

Clomial documentation built on Nov. 8, 2020, 8:16 p.m.