DataGenCKM: The function generates datasets that follow the typical...
In syuanuvt/CKM: Cardinality KM - simultaneous clustering and variable selection

View source: R/DataGenCKM.R

DataGenCKM

R Documentation

The function generates datasets that follow the typical K-means model with the option of including masking variables - variables that do not contribute to the clusters. As a simplistic version, the current function restricts the means of all signaling variables to be equal within each cluster, while the variance to be equal across all variables and clusters.

Description

The function generates datasets that follow the typical K-means model with the option of including masking variables - variables that do not contribute to the clusters. As a simplistic version, the current function restricts the means of all signaling variables to be equal within each cluster, while the variance to be equal across all variables and clusters.

Usage

DataGenCKM(n.obs, n.cluster, n.validvar, n.noisevar, mu, var, varsplit = 0)

Arguments

`n.obs`	the total number of observations
`n.cluster`	the total number of clusters
`n.validvar`	the total number of signaling variables
`n.noisevar`	the total number of masking variables
`mu`	a vector of length `n.cluster` whole element indicates the mean value of each cluster. It could also be a number that indicates the distance of neighboring clusters
`var`	a number indicates the variance of each variable
`varsplit`	either 0 or 1 (default value is 0); when 1, the variance of half of the variables equal var/2

Value

a list of two elements. The first is the generated dataset while the second is a vector of length n.obs contains the cluster assignment of each observations

Examples

ncluster <- 3
nobs <- 60
nnoisevar <- 100
nvalidvar <- 20
mu <- 1
var <- 1
sim.data <- DataGenCKM(nobs, ncluster, nvalidvar, nnoisevar, mu, var)
dataset <- sim.data[[1]]
cluster.assign <- sim.data[[2]]

syuanuvt/CKM documentation built on Dec. 1, 2022, 9:06 p.m.