kfoots: Fit mixture model or a hidden markov model

Description Usage Arguments Value

View source: R/kfoots.R

Description

Fit mixture model or a hidden markov model

Usage

1
2
3
4
5
kfoots(counts, k, framework = c("HMM", "MM"), mix_coeff = NULL,
  trans = NULL, initP = NULL, tol = 1e-04, maxiter = 200,
  nthreads = 1, nbtype = c("dep", "indep", "pois"), init = c("pca",
  "counts", "rnd"), init.nlev = 20, verbose = TRUE,
  seqlens = ncol(counts), split4speed = FALSE)

Arguments

counts

matrix of non-negative integers. Columns represent datapoints and rows dimensions

k

either the desired number of cluster, or a specific initial value for the models (mixture components or emission probabilities). See the item models in the return values to see how the model parameters should be formatted

framework

Switches between a mixture model and a hidden markov model. The default is a hidden markov model, where the order of the datapoints matters.

mix_coeff

In the MM mode, initial value for the mixture coefficients. In the HMM mode it will be ignored.

trans

In the HMM mode, initial value for the transition probabilities as a square matrix. The rows are the 'state from' and the columns are the 'state to', so each rows must sum up to 1. In the HMM mode it will be ignored.

initP

In the HMM mode, initial probabilities for each sequence of observation. They must be formatted as a matrix where each row is a state and each column is a sequence.

tol

Tolerance value used to determine convergence of the EM algorithm. The algorithm will converge when the absolute difference in the log-likelihood between two iterations will fall below this value.

maxiter

maximum number of iterations in the EM algorithm. Use 0 if you don't want to do any training iteration.

nthreads

number of threads used. The backward-forward step in the HMM learning cannot use more threads than the number of sequences.

nbtype

type of training for the negative binomial. Accepted types are: indep, dep, pois. The first type corresponds to standard maximum likelihood estimates for each parameter of each model, the second one forces the r dispersion parameters of the negative multinomials to be the same for all models, the third one forces r to be infinity, that is, every model will be a Poisson distribution.

init

Initialization scheme for the models (mixture components or emission probabilities). The value rnd results in parameters being chosen randomly, the values counts, pca use an initialization algorithm that starts from init.nlev*nrow(counts) clusters and reduces them to k using hierachical clustering.

init.nlev

Tuning parameter for the initialization schemes counts, pca.

verbose

print some output during execution

seqlens

Length of each sequence of observations. The number of columns of the count matrix should equal sum(seqlens).

split4speed

Add artificial breaks to speed-up the forward-backward algorithm. If framework=="HMM" and if multiple threads are used, the count matrix, which is already split according to seqlens, is split even further so that each thread can be assigned an equal amount of observations in the forward-backward algorithm. These artificial breaks usually have a small impact in the final parameters, and they improve the scalability with the number of cores, especially when the number of sequences is small compared to the number of cores. The artificial breaks are removed after the training phase for computing the final state assignments.

Value

a list with, among other, the following parameters:

models

a list containing the parameters of each model (mixture components or emission probabilities). Each element of the list describes a negative multinomial distribution. This is specified in another list with items mu, r and ps. mu and r correspond to parameters mu and size in the R-function dnbinom. Ps specifies the parameters of the multinomial and they sum up to 1.

loglik

the log-likelihood of the whole dataset.

posteriors

A matrix of size length(models)*ncol(counts) containing the posterior probability that a given datapoint is generated by the given mixture component

states

An integer vector of length ncol(counts) saying which model each column is associated to (using the posterior decoding algorithm).

converged

TRUE if the algorithm converged in the given number of iterations, FALSE otherwise

llhistory

time series containing the log-likelihood of the whole dataset across iterations

viterbi

In HMM mode, the viterbi path an its likelihood as a list.


lamortenera/kfoots documentation built on May 20, 2019, 7:34 p.m.