kfoots: Fit mixture model or a hidden markov model
In lamortenera/kfoots: Model multivariate count data with a mixture model

Description Usage Arguments Value

View source: R/kfoots.R

Fit mixture model or a hidden markov model

kfoots(counts, k, framework = c("HMM", "MM"), mix_coeff = NULL,
  trans = NULL, initP = NULL, tol = 1e-04, maxiter = 200,
  nthreads = 1, nbtype = c("dep", "indep", "pois"), init = c("pca",
  "counts", "rnd"), init.nlev = 20, verbose = TRUE,
  seqlens = ncol(counts), split4speed = FALSE)

`counts`	matrix of non-negative integers. Columns represent datapoints and rows dimensions
`k`	either the desired number of cluster, or a specific initial value for the models (mixture components or emission probabilities). See the item `models` in the return values to see how the model parameters should be formatted
`framework`	Switches between a mixture model and a hidden markov model. The default is a hidden markov model, where the order of the datapoints matters.
`mix_coeff`	In the `MM` mode, initial value for the mixture coefficients. In the `HMM` mode it will be ignored.
`trans`	In the `HMM` mode, initial value for the transition probabilities as a square matrix. The rows are the 'state from' and the columns are the 'state to', so each rows must sum up to 1. In the `HMM` mode it will be ignored.
`initP`	In the `HMM` mode, initial probabilities for each sequence of observation. They must be formatted as a matrix where each row is a state and each column is a sequence.
`tol`	Tolerance value used to determine convergence of the EM algorithm. The algorithm will converge when the absolute difference in the log-likelihood between two iterations will fall below this value.
`maxiter`	maximum number of iterations in the EM algorithm. Use 0 if you don't want to do any training iteration.
`nthreads`	number of threads used. The backward-forward step in the HMM learning cannot use more threads than the number of sequences.
`nbtype`	type of training for the negative binomial. Accepted types are: `indep`, `dep`, `pois`. The first type corresponds to standard maximum likelihood estimates for each parameter of each model, the second one forces the `r` dispersion parameters of the negative multinomials to be the same for all models, the third one forces `r` to be infinity, that is, every model will be a Poisson distribution.
`init`	Initialization scheme for the models (mixture components or emission probabilities). The value `rnd` results in parameters being chosen randomly, the values `counts, pca` use an initialization algorithm that starts from `init.nlev*nrow(counts)` clusters and reduces them to `k` using hierachical clustering.
`init.nlev`	Tuning parameter for the initialization schemes `counts, pca`.
`verbose`	print some output during execution
`seqlens`	Length of each sequence of observations. The number of columns of the count matrix should equal `sum(seqlens)`.
`split4speed`	Add artificial breaks to speed-up the forward-backward algorithm. If `framework=="HMM"` and if multiple threads are used, the count matrix, which is already split according to `seqlens`, is split even further so that each thread can be assigned an equal amount of observations in the forward-backward algorithm. These artificial breaks usually have a small impact in the final parameters, and they improve the scalability with the number of cores, especially when the number of sequences is small compared to the number of cores. The artificial breaks are removed after the training phase for computing the final state assignments.

a list with, among other, the following parameters:

`models`	a list containing the parameters of each model (mixture components or emission probabilities). Each element of the list describes a negative multinomial distribution. This is specified in another list with items `mu`, `r` and `ps`. `mu` and `r` correspond to parameters `mu` and `size` in the R-function `dnbinom`. Ps specifies the parameters of the multinomial and they sum up to 1.
`loglik`	the log-likelihood of the whole dataset.
`posteriors`	A matrix of size `length(models)*ncol(counts)` containing the posterior probability that a given datapoint is generated by the given mixture component
`states`	An integer vector of length `ncol(counts)` saying which model each column is associated to (using the posterior decoding algorithm).
`converged`	`TRUE` if the algorithm converged in the given number of iterations, `FALSE` otherwise
`llhistory`	time series containing the log-likelihood of the whole dataset across iterations
`viterbi`	In HMM mode, the viterbi path an its likelihood as a list.