mixmod: Single Data Source Mixture Model
In lcmix: Layered and chained mixture models

Description Usage Arguments Details Value Author(s) References See Also Examples

Fit a finite mixture model to a single source of data using one of several distributions.

mixmod(X, K, family=names(LC_FAMILY), prior=NULL, iter.max=LC_ITER_MAX, 
    dname=deparse(substitute(X)))
## S3 method for class 'mixmod'
print(x, ...)

`X`	for univariate data, a vector; for multivariate data, a matrix or data frame. Must consist only of numeric values. Each element of the vector, or each row of the matrix or data frame, should represent an independent observation.
`K`	the number of components, an integer greater than or equal to 1. `K=1` will result in the distribution specified by `family` being fitted to the entire data set, and is not particularly useful.
`family`	a string, one of the supported distribution family names given in `LC_FAMILY`. By default, `"normal"` is used. Partial matches are allowed.
`prior`	prior probability distribution on Y. This feature is under development and its use is not currently recommended.
`iter.max`	the maximum number of iterations for the EM algorithm, by default equal to `LC_ITER_MAX`.
`dname`	the name of the data.
`x`	an object of class `mixmod`.
`...`	further arguments to `print.default`.

In the finite mixture model used here, a hidden categorical random variable Y, which can take on values from 1 to some positive integer K, generates the distribution of the observed random variable X, from which the observed X is assumed to be drawn. Specifically, mixmod fits a mixture model of the form

f(x) = sum_k p_k f_k(x)

where k = 1, …, K and each f_k(.) is a density function on the sample space of X. The p_k's, that is, the component probabilities, sum to 1.

The EM algorithm used in model fitting attempts to maximize the Q-value, that is, the expected complete data log-likelihood, for the model. The parameter values which maximize the Q-value also maximize the log-likelihood for the density given above.

A list of class mixmod, having the following elements:

`N`	the length of the data, that is, `length(X)` if `X` is a vector, or `nrow(X)` if `X` is a matrix or data frame.
`D`	the width of the data, that is, 1 if `X` is a vector, or `ncol(X)` if `X` is a matrix or data frame.
`K`	the number of components in the mixture model.
`X`	the original data; if `X` was a data frame, it will have been converted to a matrix.
`npar`	the total number of parameters in the model.
`npar.hidden`	the number of parameters for the hidden component portion of the model.
`npar.observed`	the number of parameters for the observed data portion of the model.
`iter`	the number of iterations required to fit the model.
`params`	the parameters estimated for the model. This is a list with elements `hidden` and `observed`, corresponding to distribution for the hidden and observed portions of the model. `hidden` always has one element, `prob`, the vector of p_k's. The elements of `observed` depend on the distribution family chosen in fitting the model.
`stats`	a vector with named elements corresponding to the number of iterations, log-likelihood, Q-value, and BIC for the estimated parameters.
`weights`	a list with the single element `W`, the N-by-K matrix of weights used in the M-step of the EM algorithm for estimating the final set of parameters for the observed data portion of the model.
`pdfs`	a list with two elements: `G`, the N-by-K matrix of which the (n,k)th element is the estimated value of f_k(x_n), where x_n is the nth observation in `X`; and `fX`, the vector of length `N` of which the nth element is the estimated value of f(x_n).
`posterior`	the N-by-K matrix of which the (n,k)th element is the estimated posterior probability that the nth observation was generated by the kth component. Equal to the `W` element of `weights`.
`assignment`	the vector of length N of which the nth element is the most probable component to have generated the nth observation. In other words, `assignment[n] = which.max(posterior[n,])`.
`iteration.params`	a list of length `iter` giving the estimated parameters at each iteration of the algorithm.
`iteration.stats`	a data frame of `iter` rows giving iteration statistics, as in `stats`, at each iteration of the algorithm.
`family`	the name of the distribution family used in the model. See `LC_FAMILY`.
`distn`	the name of the actual distribution used in the model. See `LC_FAMILY`.
`prior`	the value of the `prior` parameter used in model fitting. See Arguments.
`iter.max`	the maximum number of distributions allowed in model fitting.
`dname`	the name of the data.
`dattr`	attributes of the data, used by model likelihood functions to determine if the data have been scaled or otherwise transformed.
`kvec`	a vector of integers from 1 to K.

Daniel Dvorkin

McLachlan, G.J. and Thriyambakam, K. (2008) The EM Algorithm and Extensions, John Wiley & Sons.

LC_FAMILY for distributions and families; mdmixmod for fitting multiple-data mixture models; reporting and likelihood for model reporting; rocinfo for performance evaluation; convergencePlot for behavior of the algorithm; simulation for simulating from the parameters of a model; packages mixtools and mclust.

## Not run:  
data(CiData)
data(CiGene)
fit <- mixmod(CiData$expression, 3)
fit
# Normal mixture model ('mvnorm')
# Data 'CiData$expression' of size 10244-by-4 fitted to 3 components
# Model statistics:
#       iter       llik       qval        bic     iclbic 
#      42.00  -47499.54  -50052.71  -95405.40 -100511.73
plot(rocinfo(fit, CiGene$target))

## End(Not run)