DataGenCKM: The function generates datasets that follow the typical...

View source: R/DataGenCKM.R

DataGenCKMR Documentation

The function generates datasets that follow the typical K-means model with the option of including masking variables - variables that do not contribute to the clusters. As a simplistic version, the current function restricts the means of all signaling variables to be equal within each cluster, while the variance to be equal across all variables and clusters.

Description

The function generates datasets that follow the typical K-means model with the option of including masking variables - variables that do not contribute to the clusters. As a simplistic version, the current function restricts the means of all signaling variables to be equal within each cluster, while the variance to be equal across all variables and clusters.

Usage

DataGenCKM(n.obs, n.cluster, n.validvar, n.noisevar, mu, var, varsplit = 0)

Arguments

n.obs

the total number of observations

n.cluster

the total number of clusters

n.validvar

the total number of signaling variables

n.noisevar

the total number of masking variables

mu

a vector of length n.cluster whole element indicates the mean value of each cluster. It could also be a number that indicates the distance of neighboring clusters

var

a number indicates the variance of each variable

varsplit

either 0 or 1 (default value is 0); when 1, the variance of half of the variables equal var/2

Value

a list of two elements. The first is the generated dataset while the second is a vector of length n.obs contains the cluster assignment of each observations

Examples

ncluster <- 3
nobs <- 60
nnoisevar <- 100
nvalidvar <- 20
mu <- 1
var <- 1
sim.data <- DataGenCKM(nobs, ncluster, nvalidvar, nnoisevar, mu, var)
dataset <- sim.data[[1]]
cluster.assign <- sim.data[[2]]

syuanuvt/CKM documentation built on Dec. 1, 2022, 9:06 p.m.