# mModel: Fitting Gaussian Mixture Model In bgmm: Gaussian Mixture Modeling Algorithms and the Belief-Based Mixture Modeling

## Description

These functions fit different variants of Gaussian mixture models. These variants differ in the fraction of knowledge utilized into the the fitting procedure.

## Usage

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27``` ```belief(X, knowns, B = NULL, k = ifelse(!is.null(B), ncol(B), ifelse(!is.null(P), ncol(P), length(unique(class)))), P = NULL, class = map(B), init.params = init.model.params(X, knowns, B = B, P = P, class = class, k = k), model.structure = getModelStructure(), stop.likelihood.change = 10^-5, stop.max.nsteps = 100, trace = FALSE, b.min = 0.025, all.possible.permutations=FALSE, pca.dim.reduction = NA) soft(X, knowns, P = NULL, k = ifelse(!is.null(P), ncol(P), ifelse(!is.null(B), ncol(B), length(unique(class)))), B = NULL, class = NULL, init.params = init.model.params(X, knowns, class = class, B = P, k = k), model.structure = getModelStructure(), stop.likelihood.change = 10^-5, stop.max.nsteps = 100, trace = FALSE, b.min = 0.025, all.possible.permutations=FALSE, pca.dim.reduction = NA, ...) semisupervised(X, knowns, class = NULL, k = ifelse(!is.null(class), length(unique(class)), ifelse(!is.null(B), ncol(B), ncol(P))), B = NULL, P = NULL, ..., init.params = NULL, all.possible.permutations=FALSE, pca.dim.reduction = NA) supervised(knowns, class = NULL, k = length(unique(class)), B = NULL, P = NULL, model.structure = getModelStructure(), ...) unsupervised(X, k, init.params=init.model.params(X, knowns=NULL, k=k), model.structure=getModelStructure(), stop.likelihood.change=10^-5, stop.max.nsteps=100, trace=FALSE, ...) ```

## Arguments

 `X` a data.frame with the unlabeled observations. The rows correspond to the observations while the columns to variables/dimensions of the data. `knowns` a data.frame with the labeled observations. The rows correspond to the observations while the columns to variables/dimensions of the data. `B` a beliefs matrix which specifies the distribution of beliefs for the labeled observations. The number of rows in B should equal the number of rows in the data.frame `knowns`. It is assumed that both the observations in `B` and in `knowns` are given in the same order. Columns correspond to the model components. If matrix B is provided, the number of columns has to be less or equal `k`. Internally, the matrix `B` is completed to `k` columns. `P` a matrix of plausibilities, i.e., weights of the prior probabilities for the labeled observations. If matrix `P` is provided, the number of columns has to be less or equal `k`. The came conditions as for `B` apply. `class` a vector of classes/labels for the labeled observations. The number of its unique values has to be less or equal `k`. `k` a number of components, by default equal to the number of columns of `B`. `init.params` initial values for the estimates of the model parameters (means, variances and mixing proportions), by default derived with the use of the `init.model.params` function. `stop.likelihood.change, stop.max.nsteps` the parameters for the EM algorithms defining the stop criteria, i.e., the minimum required improvement of loglikelihood and the maximum number of steps. `trace` if `trace=TRUE` the loglikelihoods for every step of EM algorithm are printed out. `model.structure` an object returned by the `getModelStructure` function, which specifies constraints for the parameters of the model to be fitted. `b.min` this argument is passed to the `init.model.params` function. `...` these arguments will be passed tothe `init.model.params` function. `all.possible.permutations` If equal `TRUE`, all possible initial parameters' permutations of components are considered. Since there is kList! permutations, model fitting is repeated kList! times. As a result, only the model with the highest likelihood is returned. `pca.dim.reduction` Since the fitting for high dimensional space is numerically a bad idea an attempt to PCA will be performed if `pca.dim.reduction !- FALSE`. If equal `NA` then the target dimension is data driven, if it's a number then this will be the target dimension.

## Details

In the `belief()` function, if the argument `B` is not provided, it is by default initialized from the argument `P`. If the argument `P` is not provided, `B` is derived from the `class` argument with the use of the function `get.simple.beliefs()` which assigns `1-(k-1)*b.min` to the component given by `class` and `b.min` to all remaining components.

In the `soft()` function, if the argument `P` is not provided, it is by default initialized from the argument `B`. If the argument `B` is not provided, `P` is derived from the `class` argument as in the `belief()` function.

In the `supervised()` function, if the argument `class` is not provided, it is by default initialized from argument `B` or `P`, taking the label of each observation as its most believed or plausible component (by the MAP rule).

The number of columns in the beliefs matrix `B` or in the matrix of plausibilities `P` may be smaller than the number of model components defined by the argument `k`. Such situation corresponds to the scenario when the user does not know any examples for some component. In other words, this component is not used as a label for any observation, and thus can be omitted from the beliefs matrix. An equivalent would be to include a column for this component and fill it with beliefs/plausibilities equal 0.

Slots in the returned object are listed in section Value. The returned object differs slighty with respect to the used function. Namely, the `belief()` function returns an object with the slot `B`. The function `soft()` returns an object with a slot `P`, while the functions `supervised()` and `semisupervised()` return objects with a slot `class` instead.

The object returned by the function `supervised()` does not have the slot `X`.

## Value

An object of the class `mModel`, with the following slots:

 `pi` a vector with the fitted mixing proportions `mu` a matrix with the means' vectors, fitted for all components `cvar` a three-dimensional matrix with the covariance matrices, fitted for all components `X` the unlabeled observations `knowns` the labeled observations `B` the beliefs matrix `n` the number of all observations `m` the number of the unlabeled observations `k` the number of fitted model components `d` the data dimension `likelihood` the log-likelihood of the fitted model `n.steps` the number of steps performed by the EM algorithm `model.structure` the set of constraints kept during the fitting process.

## Author(s)

Przemyslaw Biecek

## References

Przemyslaw Biecek, Ewa Szczurek, Martin Vingron, Jerzy Tiuryn (2012), The R Package bgmm: Mixture Modeling with Uncertain Knowledge, Journal of Statistical Software.

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20``` ```data(genotypes) modelSupervised = supervised(knowns=genotypes\$knowns, class=genotypes\$labels) plot(modelSupervised) modelSemiSupervised = semisupervised(X=genotypes\$X, knowns=genotypes\$knowns, class = genotypes\$labels) plot(modelSemiSupervised) modelBelief = belief(X=genotypes\$X, knowns=genotypes\$knowns, B=genotypes\$B) plot(modelBelief) modelSoft = soft(X=genotypes\$X, knowns=genotypes\$knowns, P=genotypes\$B) plot(modelSoft) modelUnSupervised = unsupervised(X=genotypes\$X, k=3) plot(modelUnSupervised) ```

bgmm documentation built on Oct. 10, 2021, 5:07 p.m.