synthetic_data: Generating Point-level Data Having Several Groups

View source: R/HCV.R

synthetic_dataR Documentation

Generating Point-level Data Having Several Groups

Description

Generation of synthetic point-level data based on a method proposed by Lin et al. (2005).

Usage

synthetic_data(k, f, r, n, feature, geometry, homogeneity = TRUE)

Arguments

k

integer specifying the number of groups.

f

positive number controlling the concentration of generated samples toward large groups.

r

positive number controlling the variance of individual attributes on the feature domain.

n

integer specifying the total number of sampled points.

feature

integer specifying the number of attributes for the feature domain.

geometry

integer specifying the number of attributes for the geometry domain.

homogeneity

logical indicating whether to force the centers of the feature domain to be the same as those of the geometry domain. Default is TRUE.

Value

A list with two matrices and a vector of labels. One matrix is for the feature domain and the other is for the geometry domain, both of which have n sampled points. The vector of labels indicates which cluster each sample belongs to.

Author(s)

ShengLi Tzeng and Hao-Yun Hsu.

References

Lin, C. R., Liu, K. H., and Chen, M. S. (2005). Dual clustering: integrating data clustering over optimization and constraint domains. IEEE Transactions on Knowledge and Data Engineering, 17(5), 628-637.

Examples

set.seed(0)
pcase <- synthetic_data(3,30,0.02,100,2,2)
oldpar <- par(no.readonly = TRUE)  
par(mfrow=c(1,2))
labcolor <- (pcase$labels+1)%%3+1
plot(pcase$feat, col = labcolor, pch=19, xlab = 'First attribute', 
  ylab = 'Second attribute', main = 'Feature domain')
plot(pcase$geo, col = labcolor, pch=19, xlab = 'First attribute', 
  ylab = 'Second attribute', main = 'Geometry domain')
par(oldpar)


HCV documentation built on March 18, 2022, 6:01 p.m.

Related to synthetic_data in HCV...