sim_unif: Generates random variates from five multivariate uniform...
In clusteval: Evaluation of Clustering Algorithms

Description Usage Arguments Details Value Examples

We generate n observations from each of four trivariate distributions such that the Euclidean distance between each of the populations is a fixed constant, delta > 0.

1	sim_unif(n = rep(25, 5), delta = 0, seed = NULL)

`n`	a vector (of length M = 5) of the sample sizes for each population
`delta`	the fixed distance between each population and the origin
`seed`	Seed for random number generation. (If NULL, does not set seed)

To define the populations, let x = (X_1, …, X_p)' be a multivariate uniformly distributed random vector such that X_j \sim U(a_j, b_j) is an independently distributed uniform random variable with a_j < b_j for j = 1, …, p. Let Pi_m denote the mth population (m = 1, …, 5). Then, we have the five populations:

Π_1 = U(-1/2, 1/2) \times U(Δ - 1/2, Δ + 1/2) \times U(-1/2, 1/2) \times U(-1/2, 1/2),

Π_2 = U(Δ - 1/2, Δ + 1/2) \times U(-1/2, 1/2) \times U(-1/2, 1/2) \times U(-1/2, 1/2),

Π_3 = U(-1/2, 1/2) \times U(-Δ - 1/2, -Δ + 1/2) \times U(-1/2, 1/2) \times U(-1/2, 1/2),

Π_4 = U(-1/2, 1/2) \times U(-1/2, 1/2) \times U(-Δ - 1/2, -Δ + 1/2) \times U(-1/2, 1/2),

Π_5 = U(-1/2, 1/2) \times U(-1/2, 1/2) \times U(-1/2, 1/2) \times U(Δ - 1/2, Δ + 1/2).

We generate n_m observations from population Π_m.

For Δ = 0 and ρ_m = ρ, m = 1, …, M, the M populations are equal.

Notice that the support of each population is a unit hypercube with 4 features. Moreover, for Δ ≥ 1, the populations are mutually exclusive and entirely separated.

named list containing:

x:: A matrix whose rows are the observations generated and whose columns are the p features (variables)
y:: A vector denoting the population from which the observation in each row was generated.

data_generated <- sim_unif(seed = 42)
dim(data_generated$x)
table(data_generated$y)

data_generated2 <- sim_unif(n = 10 * seq_len(5), delta = 1.5)
table(data_generated2$y)
sample_means <- with(data_generated2,
                     tapply(seq_along(y), y, function(i) {
                            colMeans(x[i,])
                     }))
(sample_means <- do.call(rbind, sample_means))