Generates random variates from five multivariate uniform populations.

Description

We generate n observations from each of four trivariate distributions such that the Euclidean distance between each of the populations is a fixed constant, delta > 0.

Usage

1
  sim_unif(n = rep(25, 5), delta = 0, seed = NULL)

Arguments

n

a vector (of length M = 5) of the sample sizes for each population

delta

the fixed distance between each population and the origin

seed

Seed for random number generation. (If NULL, does not set seed)

Details

To define the populations, let x = (X_1, …, X_p)' be a multivariate uniformly distributed random vector such that X_j \sim U(a_j, b_j) is an independently distributed uniform random variable with a_j < b_j for j = 1, …, p. Let Pi_m denote the mth population (m = 1, …, 5). Then, we have the five populations:

Π_1 = U(-1/2, 1/2) \times U(Δ - 1/2, Δ + 1/2) \times U(-1/2, 1/2) \times U(-1/2, 1/2),

Π_2 = U(Δ - 1/2, Δ + 1/2) \times U(-1/2, 1/2) \times U(-1/2, 1/2) \times U(-1/2, 1/2),

Π_3 = U(-1/2, 1/2) \times U(-Δ - 1/2, -Δ + 1/2) \times U(-1/2, 1/2) \times U(-1/2, 1/2),

Π_4 = U(-1/2, 1/2) \times U(-1/2, 1/2) \times U(-Δ - 1/2, -Δ + 1/2) \times U(-1/2, 1/2),

Π_5 = U(-1/2, 1/2) \times U(-1/2, 1/2) \times U(-1/2, 1/2) \times U(Δ - 1/2, Δ + 1/2).

We generate n_m observations from population Π_m.

For Δ = 0 and ρ_m = ρ, m = 1, …, M, the M populations are equal.

Notice that the support of each population is a unit hypercube with 4 features. Moreover, for Δ ≥ 1, the populations are mutually exclusive and entirely separated.

Value

named list containing:

x:

A matrix whose rows are the observations generated and whose columns are the p features (variables)

y:

A vector denoting the population from which the observation in each row was generated.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
data_generated <- sim_unif(seed = 42)
dim(data_generated$x)
table(data_generated$y)

data_generated2 <- sim_unif(n = 10 * seq_len(5), delta = 1.5)
table(data_generated2$y)
sample_means <- with(data_generated2,
                     tapply(seq_along(y), y, function(i) {
                            colMeans(x[i,])
                     }))
(sample_means <- do.call(rbind, sample_means))