Description Format Details References Examples
The dataset consists of 2000 data points in R^{14}. On the subset of relevant clustering variables S = \{1, 2\}, data are distributed from a mixture of four equiprobable spherical Gaussian distributions with means (0,0), (4,0) (0,2) and (4,2). The subset of redundant variables is U =\{3-11\} that are explained by the subset of predictor variables R = \{1,2\}. The last three variables are independent W = \{11, 12, 13\}.
A data matrix with 2000 observations on 14 variables and the last column contains the labels.
scenarioCor[,1:14]
a numeric matrix containing the observations
scenarioCor[,15]
an integer vector containing the labels
The subset U of redundant variables is simulated as follows :
x^{U} = (0,0, 0.4, 0.8, ..., 2) + x^{S} b + \varepsilon, with \varepsilon \sim N(0_9, Ω)
The subset W of independent variables is simulated as follows :
x^{W} \sim N((3.2, 3.6, 4), I_3)
For more details on the regression coefficients b and the covariance matrix Ω see Maugis et al.(2009).
Maugis, C., Celeux, G., and Martin-Magniette, M. L., 2009. "Variable selection in model-based clustering: A general variable role modeling". Computational Statistics and Data Analysis, vol. 53/11, pp. 3872-3882.
1 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.