synthetic_data | R Documentation |
Generation of synthetic point-level data based on a method proposed by Lin et al. (2005).
synthetic_data(k, f, r, n, feature, geometry, homogeneity = TRUE)
k |
integer specifying the number of groups. |
f |
positive number controlling the concentration of generated samples toward large groups. |
r |
positive number controlling the variance of individual attributes on the feature domain. |
n |
integer specifying the total number of sampled points. |
feature |
integer specifying the number of attributes for the feature domain. |
geometry |
integer specifying the number of attributes for the geometry domain. |
homogeneity |
logical indicating whether to force the centers of the feature domain to be the same as those of the geometry domain. Default is TRUE. |
A list with two matrices and a vector of labels. One matrix is for the feature domain and the other is for the geometry domain, both of which have n sampled points. The vector of labels indicates which cluster each sample belongs to.
ShengLi Tzeng and Hao-Yun Hsu.
Lin, C. R., Liu, K. H., and Chen, M. S. (2005). Dual clustering: integrating data clustering over optimization and constraint domains. IEEE Transactions on Knowledge and Data Engineering, 17(5), 628-637.
set.seed(0) pcase <- synthetic_data(3,30,0.02,100,2,2) oldpar <- par(no.readonly = TRUE) par(mfrow=c(1,2)) labcolor <- (pcase$labels+1)%%3+1 plot(pcase$feat, col = labcolor, pch=19, xlab = 'First attribute', ylab = 'Second attribute', main = 'Feature domain') plot(pcase$geo, col = labcolor, pch=19, xlab = 'First attribute', ylab = 'Second attribute', main = 'Geometry domain') par(oldpar)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.