simulateData: Dataset generation
In bgmm: Gaussian Mixture Modeling Algorithms and the Belief-Based Mixture Modeling

Description Usage Arguments Value Author(s) References Examples

View source: R/getSimulatedKdimensionalData.r

The function simulateData generates an artificial dataset from a mixture of Gaussian components with a given set of parameters.

1
2
3

 simulateData(d = 2, k = 4, n = 100, m = 10, mu = NULL, cvar = NULL, 
    s.pi = rep(1/k, k), b.min = 0.02, mean = "D", between = "D", 
    within = "D", cov = "D", n.labels = k)

`d`	the dimension of the data set,
`k`	the number of the model components,
`n`	the total number of observations, both labeled and unlabeled,
`mu`	a matrix with `k` rows and `d` columns, which defines the means' vectors for the corresponding model components. If not specified, by default its values are generated from a normal distribution N(0,49),
`cvar`	a three-dimensional array with the dimensions (`k`, `d`, `d`). If not specified, each covariance matrix is generated in three steps: first, 2*`d` samples from a `d`-dimensional normal distribution N(0, Id) are generated. Next, a covariance matrix d x d for these samples is calculated. Finally, the resulting sample covariance matrix is scaled by a factor generated from an exponential distribution Exp(1),
`s.pi`	a vector of `k` probabilities, i.e. the mixing proportions of the model. The mixing proportions specify a multinomial distribution over the components, from which the numbers of observations in each cluster are generated. By default a uniform distribution is used.
`mean, between, within, cov`	constraints on the model structure. By default all are equal to "D". If other values are set, the parameters `mu` and `cvar` are adjusted to match the specified constraints,
`m`	the number of the observations, for which the beliefs are to be calculated,
`b.min`	the belief that an observation does not belong to a component. Formally, the belief bij for the observation i to belong to component j is equal `b.min` if i is not generated from component j. Thus, the belief that i belongs to its true component is set to `1-b.min*(n.labels-1)`, and `b.min` is constrained that `b.min`$<1/$`n.labels`. By default `b.min=0.02`,
`n.labels`	the number of components used as labels, defining the number of columns in the resulting beliefs matrix. By default `n.labels` equals `k`, but the user can specify a smaller number. Using this argument the user can define a scenario in which the data are generated from a mixture of three components, but only two of them are used as labels in the beliefs matrix (applied in the example below).

An list with the following elements:

`X`	the matrix of size n-m rows and d columns with generated values of unlabeled observations,
`knowns`	the matrix of size m rows and d columns with generated values of labeled observations,
`B`	the belief matrix of the size m rows and k columns derived for knowns matrix,
`model.params`	the list of model parameters,
`Ytrue`	indexes of the true Gaussian components from which each observation was generated. Lables for knowns go first.

Przemyslaw Biecek

Przemyslaw Biecek, Ewa Szczurek, Martin Vingron, Jerzy Tiuryn (2012), The R Package bgmm: Mixture Modeling with Uncertain Knowledge, Journal of Statistical Software.

 simulated = simulateData(d=2, k=3, n=300, m=60, cov="0", within="E", n.labels=2)
 model = belief(X = simulated$X, knowns = simulated$knowns, B=simulated$B)
 plot(model)

 simulated = simulateData(d=1, k=2, n=300, m=60, n.labels=2)
 model = belief(X = simulated$X, knowns = simulated$knowns, B=simulated$B)
 plot(model)

Loading required package: mvtnorm
Loading required package: car
Loading required package: lattice
Loading required package: combinat

Attaching package: 'combinat'

The following object is masked from 'package:utils':

    combn