40-generate.MixSim: Generate MixSim Examples for Testing

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

This function utilizes MixSim to generate sets of data for testing algorithms.

Usage

1
2
  generate.MixSim(N, p, K, MixSim.obj = NULL, MaxOmega = NULL,
                  BarOmega = NULL, PiLow = 1.0, sph = FALSE, hom = FALSE)

Arguments

N

total sample size across all S processors, i.e. sum over N.spmd is N.

p

dimension of data X.spmd, i.e. ncol(X.spmd).

K

number of clusters.

MixSim.obj

an object returned from MixSim.

MaxOmega

maximum overlap as in MixSim.

BarOmega

averaged overlap as in MixSim.

PiLow

lower bound of mixture proportion as in MixSim.

sph

sph as in MixSim.

hom

hom as in MixSim.

Details

If MixSim.obj is NULL, then BarOmega and MaxOmega will be used in MixSim to obtain a new MixSim.obj.

Value

A set of simulated data and information will be returned in a list variable including:

K number of clusters, as the input
p dimension of data X.spmd, as the input
N total sample size, as the input
N.allspmds a collection of sample sizes for all S processors, as the input
N.spmd total sample size of given processor, as the input
X.spmd generated data set with dimension with dimension N.spmd * p
CLASS.spmd true id of each data, a vector of length N.spmd and has values from 1 to K
N.CLASS.spmd true sample size of each clusters, a vector of length K
MixSim.obj the true model where data X.spmd generated from

Author(s)

Wei-Chen Chen wccsnow@gmail.com and George Ostrouchov.

References

Melnykov, V., Chen, W.-C. and Maitra, R. (2012) “MixSim: Simulating Data to Study Performance of Clustering Algorithms”, Journal of Statistical Software, (accepted).

Programming with Big Data in R Website: https://pbdr.org/

See Also

generate.basic.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
## Not run: 
# Save code in a file "demo.r" and run in 4 processors by
# > mpiexec -np 4 Rscript demo.r

### Setup environment.
library(pmclust, quiet = TRUE)

### Generate an example data.
N <- 5000
p <- 2
K <- 2
data.spmd <- generate.MixSim(N, p, K, BarOmega = 0.01)
X.spmd <- data.spmd$X.spmd

### Run clustering.
PARAM.org <- set.global(K = K)          # Set global storages.
# PARAM.org <- initial.em(PARAM.org)    # One initial.
PARAM.org <- initial.RndEM(PARAM.org)   # Ten initials by default.
PARAM.new <- apecma.step(PARAM.org)     # Run APECMa.
em.update.class()                       # Get classification.

### Get results.
N.CLASS <- get.N.CLASS(K)
comm.cat("# of class:", N.CLASS, "\n")
comm.cat("# of class (true):", data.spmd$N.CLASS.spmd, "\n")

### Quit.
finalize()

## End(Not run)

pmclust documentation built on Feb. 11, 2021, 5:05 p.m.