# sim_unif: Generates random variates from five multivariate uniform... In clusteval: Evaluation of Clustering Algorithms

## Description

We generate n observations from each of four trivariate distributions such that the Euclidean distance between each of the populations is a fixed constant, delta > 0.

## Usage

 1  sim_unif(n = rep(25, 5), delta = 0, seed = NULL) 

## Arguments

 n a vector (of length M = 5) of the sample sizes for each population delta the fixed distance between each population and the origin seed Seed for random number generation. (If NULL, does not set seed)

## Details

To define the populations, let x = (X_1, …, X_p)' be a multivariate uniformly distributed random vector such that X_j \sim U(a_j, b_j) is an independently distributed uniform random variable with a_j < b_j for j = 1, …, p. Let Pi_m denote the mth population (m = 1, …, 5). Then, we have the five populations:

Π_1 = U(-1/2, 1/2) \times U(Δ - 1/2, Δ + 1/2) \times U(-1/2, 1/2) \times U(-1/2, 1/2),

Π_2 = U(Δ - 1/2, Δ + 1/2) \times U(-1/2, 1/2) \times U(-1/2, 1/2) \times U(-1/2, 1/2),

Π_3 = U(-1/2, 1/2) \times U(-Δ - 1/2, -Δ + 1/2) \times U(-1/2, 1/2) \times U(-1/2, 1/2),

Π_4 = U(-1/2, 1/2) \times U(-1/2, 1/2) \times U(-Δ - 1/2, -Δ + 1/2) \times U(-1/2, 1/2),

Π_5 = U(-1/2, 1/2) \times U(-1/2, 1/2) \times U(-1/2, 1/2) \times U(Δ - 1/2, Δ + 1/2).

We generate n_m observations from population Π_m.

For Δ = 0 and ρ_m = ρ, m = 1, …, M, the M populations are equal.

Notice that the support of each population is a unit hypercube with 4 features. Moreover, for Δ ≥ 1, the populations are mutually exclusive and entirely separated.

## Value

named list containing:

x:

A matrix whose rows are the observations generated and whose columns are the p features (variables)

y:

A vector denoting the population from which the observation in each row was generated.

## Examples

  1 2 3 4 5 6 7 8 9 10 11 data_generated <- sim_unif(seed = 42) dim(data_generated$x) table(data_generated$y) data_generated2 <- sim_unif(n = 10 * seq_len(5), delta = 1.5) table(data_generated2\$y) sample_means <- with(data_generated2, tapply(seq_along(y), y, function(i) { colMeans(x[i,]) })) (sample_means <- do.call(rbind, sample_means)) 

clusteval documentation built on May 29, 2017, 11:45 p.m.