NormalDisData: A Dataset Containing 3 Clusters of Normal Distribution Data

Description Usage Format Source Examples

Description

A simulated dataset used in the 'SWKM' vignette for illustration to perform sparse weighted K-Means clustering.

Usage

1

Format

A list with three items:

data

a 60 by 500 matrix, indicating 60 observations with 500 features. The first 50 features are cluster-specific.

nCluster

true number of clusters.

true.label

a 60-dimension integer vector indicating the cluster each observation should be assigned to.

noisy.label

a 10-dimension integer vector indicating the positions of noisy observations. The noisy observations share the same mean with normal ones but have larger variance.

Source

This dataset is generated by the code shown in the example.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
## Not run: 
# this is the code to generate this dataset.
set.seed(1)
require(mvtnorm)
n <- 60  #sample size
p <- 500 #dimension of features
q <- 50  #dimension of cluster-specific features
mu <- 0.8
MU <- c(0,-mu,mu)
sigma0 <- 5
data <- rbind(rmvnorm(n/3,rep(0,p)),rmvnorm(n/3,c(rep(-mu,q),rep(0,p-q))),
              rmvnorm(n/3,c(rep(mu,q),rep(0,p-q))))
# add noise to 10 random observations
noisy.lab <- sample(n,10)
for (k in 1:3){
  check <- (noisy.lab<n*k/3+1) & (noisy.lab>n/3*(k-1))
  temp.lab <- noisy.lab[check]
  num <- length(temp.lab)
  if(any(check))
    data[temp.lab,] <- rmvnorm(num,c(rep(MU[k],q),rep(0,p-q)),sigma = diag(sigma0,p))
}

## End(Not run)

Van1yu3/SWKM documentation built on Sept. 3, 2019, 7:50 a.m.