# Multiple Data Mixture Models

### Description

Fits a layered or chained mixture model to a list representing multiple sources of data, using a choice of distributions and number of components for each data source.

### Usage

1 2 3 4 | ```
mdmixmod(X, K, K0=min(K), topology=LC_TOPOLOGY, family=NULL, prior=NULL,
prefit=TRUE, iter.max=LC_ITER_MAX, dname=deparse(substitute(X)))
## S3 method for class 'mdmixmod'
print(x, ...)
``` |

### Arguments

`X` |
a list of observed data sources; the elements must be numeric vectors, matrices, or data frames. Each element of |

`K` |
the vector of numbers of mixture components for the hidden variables corresponding to each observed data source. If |

`K0` |
the number of mixture components for the top-level hidden variable. |

`topology` |
one of the model topologies in |

`family` |
a vector of names of distribution families to be used in fitting the models for each observed data source; each element of |

`prior` |
prior probability distribution on |

`prefit` |
logical; if |

`iter.max` |
the maximum number of iterations for the EM algorithm, by default equal to |

`dname` |
the name of the data. |

`x` |
an object of class |

`...` |
further arguments to |

### Details

In the layered model, a top-level hidden categorical random variable *Y_0*, which can take on values from 1 to some positive integer *K_0*, generates categorical hidden random variables *Y_1, …, Y_Z* for some positive integer *Z*. For *z = 1,…,Z*, each *Y_z* can take on values from 1 to some positive integer *K_z*. In the chained model, *Y_0* generates *Y_1*, which in turn generates *Y_2*, etc., up to *Y_{Z-1}*, which generates *Y_Z*.

In both models, the *Y_z*'s generate the observed mixture random variables *X_1, …, X_Z*, from which the elements of the observed data `X`

are assumed to be drawn. (That is, `Z = length(X)`

, the number of list elements in `X`

.) The relationship between each *Y_z* and *X_z* is the same as the relationship between *Y* and *X* in `mixmod`

.

As in `mixmod`

, the EM algorithm attempts to maximize the Q-value, that is, the expected complete data (hidden and observed variables) log-likelihood.

### Value

A list of class `mdmixmod`

, a subclass of `mixmod`

, having the following elements:

`N` |
the length of the data, that is, |

`Z` |
the size of the data, that is, |

`D` |
the vector of widths of the data, that is, |

`K` |
the vector of the numbers of components in the lower-level mixture models. |

`K0` |
the number of components in the top-level mixture model, that is, |

`X` |
the original data, with data frames converted to matrices. If the elements of |

`npar` |
the total number of parameters in the model. |

`npar.hidden` |
the number of parameters for the hidden component portion of the model. |

`npar.observed` |
the number of parameters for the observed data portion of the model. |

`iter` |
the number of iterations required to fit the model. |

`params` |
the parameters estimated for the model. This is a list with elements |

`stats` |
a vector with named elements corresponding to the number of iterations, log-likelihood, Q-value, and BIC for the estimated parameters. |

`weights` |
a list with elements |

`pdfs` |
a list with elements |

`posterior` |
the |

`assignment` |
the vector of length |

`iteration.params` |
a list of length |

`iteration.stats` |
a data frame of |

`topology` |
the topology of the model. |

`family` |
the vector of names of the distribution families used in the model. See |

`distn` |
the vector of names of the actual distributions used in the model. See |

`iter.max` |
the maximum number of distributions allowed in model fitting. |

`dname` |
the name of the data. |

`dattr` |
attributes of the data, used by model likelihood functions to determine if the data have been scaled or otherwise transformed. |

`zvec` |
the vector of names of |

`kvec` |
a list of which the |

`k0vec` |
a vector of integers from 1 to |

`prior` |
the value of the |

`marginals` |
if |

### Author(s)

Daniel Dvorkin

### References

McLachlan, G.J. and Thriyambakam, K. (2008) *The EM Algorithm and Extensions*, John Wiley & Sons.

### See Also

`LC_FAMILY`

for distributions and families; `mixmod`

for fitting single-data mixture models; `reporting`

and `likelihood`

for model reporting; `rocinfo`

for performance evaluation; `convergencePlot`

for behavior of the algorithm; `simulation`

for simulating from the parameters of a model.

### Examples

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | ```
## Not run:
data(CiData)
data(CiGene)
fit <- mdmixmod(CiData, c(2,3,2), topology="chained",
family=c("pvii", "norm", "pvii"))
fit
# Chained (PVII, normal, PVII) mixture model ('pvii', 'mvnorm', 'pvii')
# Data 'CiData' of size 10244-by-(1,4,1) fitted to 2 (2,3,2) components
# Model statistics:
# iter llik qval bic iclbic
# 377.00 -75859.81 -87065.28 -152310.62 -174721.56
margs <- marginals(fit)
allFits <- c(list(chained=fit), margs)
plot(multiroc(allFits, CiGene$target))
## End(Not run)
``` |