# logLikePoisMix: Log likelihood calculation for a Poisson mixture model In HTSCluster: Clustering High-Throughput Transcriptome Sequencing (HTS) Data

## Description

Functions to calculate the log likelihood for a Poisson mixture model, the difference in log likelihoods for two different sets of parameters of a Poisson mixture model or the log-likelihood for each observation.

## Usage

 1 2 3 logLikePoisMix(y, mean, pi) logLikePoisMixDiff(y, mean.new, pi.new, mean.old, pi.old) mylogLikePoisMixObs(y, conds, s, lambda, pi) 

## Arguments

 y (n x q) matrix of observed counts for n observations and q variables mean List of length g containing the (n x q) matrices of conditional mean expression for all observations, as calculated by the PoisMixMean function, where g represents the number of clusters mean.new List of length g containing the (n x q) matrices of conditional mean expression for all observations for one set of parameters, as calculated by the PoisMixMean function, where g represents the number of clusters mean.old List of length g containing the (n x q) matrices of conditional mean expression for all observations for another set of parameters, as calculated by the PoisMixMean function, where g represents the number of clusters pi.new Vector of length g containing one estimate for \hat{π} pi.old Vector of length g containing another estimate for \hat{π} pi Vector of length g containing estimate for \hat{π} conds Vector of length q defining the condition (treatment group) for each variable (column) in y s Estimate of normalized per-variable library size lambda (d x g) matrix containing the current estimate of lambda, where d is the number of conditions (treatment groups) and g is the number of clusters

## Details

The logLikePoisMixDiff function is used to calculate the difference in log likelihood for two different sets of parameters in a Poisson mixture model; it is used to determine convergence in the EM algorithm run by the PoisMixClus function. The logLikePoisMix function (taken largely from the mylogLikePoisMix function from the poisson.glm.mix R package) calculates the log likelihood for a given set of parameters in a Poisson mixture model and is used in the PoisMixClus function for the calculation of the BIC and ICL. The mylogLikePoisMixObs function calculates the log likelihood per observation for a given set of parameters in a Poisson mixture model.

## Value

 ll  (Depending on the context), the log likelihood, difference in log likelihoods for two different sets of parameters, or per-observation log-likelihood

## Note

In the logLikePoisMixDiff function, we make use of the alternative mass function for a Poisson density proposed by Loader (2000) to avoid computational difficulties. The logLikePoisMixDiff function returns a default value of 100 if one or both of the log likelihoods associated with the two parameter sets takes on a value of -∞.

## Author(s)

Andrea Rau <andrea.rau@jouy.inra.fr>

## References

Loader, C. (2000) Fast and accurate computation of binomial probabilities. Available at https://lists.gnu.org/archive/html/octave-maintainers/2011-09/pdfK0uKOST642.pdf.

Rau, A., Maugis-Rabusseau, C., Martin-Magniette, M.-L., Celeux G. (2015). Co-expression analysis of high-throughput transcriptome sequencing data with Poisson mixture models. Bioinformatics, 31(9):1420-1427.

Rau, A., Celeux, G., Martin-Magniette, M.-L., Maugis-Rabusseau, C. (2011) Clustering high-throughput sequencing data with Poisson mixture models. Inria Research Report 7786. Available at http://hal.inria.fr/inria-00638082.

PoisMixClus for Poisson mixture model estimation and model selection; PoisMixMean to calculate the per-cluster conditional mean of each observation
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 set.seed(12345) ## Simulate data as shown in Rau et al. (2011) ## Library size setting "A", low cluster separation ## n = 200 observations simulate <- PoisMixSim(n = 200, libsize = "A", separation = "low") y <- simulate$y conds <- simulate$conditions w <- rowSums(y) ## Estimate of w r <- table(conds) ## Number of replicates per condition d <- length(unique(conds)) ## Number of conditions s <- colSums(y) / sum(y) ## TC estimate of lib size s.dot <- rep(NA, d) ## Summing lib size within conditions for(j in 1:d) s.dot[j] <- sum(s[which(conds == unique(conds)[j])]); ## Initial guess for pi and lambda g.true <- 4 pi.guess <- simulate$pi ## Recalibrate so that (s.dot * lambda.guess) = 1 lambda.sim <- simulate$lambda lambda.guess <- matrix(NA, nrow = d, ncol = g.true) for(k in 1:g.true) { tmp <- lambda.sim[,k]/sum(lambda.sim[,k]) lambda.guess[,k] <- tmp/s.dot } ## Run the PMM-II model for g = 4 ## with EM algorithm and "TC" library size parameter run <- PoisMixClus(y, g = 4, norm = "TC", conds = conds) pi.est <- run$pi lambda.est <- run$lambda ## Mean values for each of the parameter sets mean.guess <- PoisMixMean(y, 4, conds, s, lambda.guess) mean.est <- PoisMixMean(y, 4, conds, s, lambda.est) ## Difference in log likelihoods LL.diff <- logLikePoisMixDiff(y, mean.guess, pi.guess, mean.est, pi.est) LL.diff ## -12841.11