# simulateData: Dataset generation In bgmm: Gaussian Mixture Modeling Algorithms and the Belief-Based Mixture Modeling

## Description

The function `simulateData` generates an artificial dataset from a mixture of Gaussian components with a given set of parameters.

## Usage

 ```1 2 3``` ``` simulateData(d = 2, k = 4, n = 100, m = 10, mu = NULL, cvar = NULL, s.pi = rep(1/k, k), b.min = 0.02, mean = "D", between = "D", within = "D", cov = "D", n.labels = k) ```

## Arguments

 `d` the dimension of the data set, `k` the number of the model components, `n` the total number of observations, both labeled and unlabeled, `mu` a matrix with `k` rows and `d` columns, which defines the means' vectors for the corresponding model components. If not specified, by default its values are generated from a normal distribution N(0,49), `cvar` a three-dimensional array with the dimensions (`k`, `d`, `d`). If not specified, each covariance matrix is generated in three steps: first, 2*`d` samples from a `d`-dimensional normal distribution N(0, Id) are generated. Next, a covariance matrix d x d for these samples is calculated. Finally, the resulting sample covariance matrix is scaled by a factor generated from an exponential distribution Exp(1), `s.pi` a vector of `k` probabilities, i.e. the mixing proportions of the model. The mixing proportions specify a multinomial distribution over the components, from which the numbers of observations in each cluster are generated. By default a uniform distribution is used. `mean, between, within, cov` constraints on the model structure. By default all are equal to "D". If other values are set, the parameters `mu` and `cvar` are adjusted to match the specified constraints, `m` the number of the observations, for which the beliefs are to be calculated, `b.min` the belief that an observation does not belong to a component. Formally, the belief bij for the observation i to belong to component j is equal `b.min` if i is not generated from component j. Thus, the belief that i belongs to its true component is set to `1-b.min*(n.labels-1)`, and `b.min` is constrained that `b.min`\$<1/\$`n.labels`. By default `b.min=0.02`, `n.labels` the number of components used as labels, defining the number of columns in the resulting beliefs matrix. By default `n.labels` equals `k`, but the user can specify a smaller number. Using this argument the user can define a scenario in which the data are generated from a mixture of three components, but only two of them are used as labels in the beliefs matrix (applied in the example below).

## Value

An list with the following elements:

 `X` the matrix of size n-m rows and d columns with generated values of unlabeled observations, `knowns` the matrix of size m rows and d columns with generated values of labeled observations, `B` the belief matrix of the size m rows and k columns derived for knowns matrix, `model.params` the list of model parameters, `Ytrue` indexes of the true Gaussian components from which each observation was generated. Lables for knowns go first.

## Author(s)

Przemyslaw Biecek

## References

Przemyslaw Biecek, Ewa Szczurek, Martin Vingron, Jerzy Tiuryn (2012), The R Package bgmm: Mixture Modeling with Uncertain Knowledge, Journal of Statistical Software.

## Examples

 ```1 2 3 4 5 6 7``` ``` simulated = simulateData(d=2, k=3, n=300, m=60, cov="0", within="E", n.labels=2) model = belief(X = simulated\$X, knowns = simulated\$knowns, B=simulated\$B) plot(model) simulated = simulateData(d=1, k=2, n=300, m=60, n.labels=2) model = belief(X = simulated\$X, knowns = simulated\$knowns, B=simulated\$B) plot(model) ```

### Example output

```Loading required package: mvtnorm