simdata_uniform: Generates random variates from multivariate uniform...

Description Usage Arguments Details Value Examples

View source: R/simdata-uniform.r

Description

We generate n observations from each of K_0 multivariate uniform distributions such that the Euclidean distance between each of the populations and the origin is equal and scaled by Δ ≥ 0.

Usage

1
  simdata_uniform(n = rep(25, 5), delta = 0, seed = NULL)

Arguments

n

a vector (of length K_0) of the sample sizes for each population

delta

the fixed distance between each population and the origin

seed

seed for random number generation. (If NULL, does not set seed)

Details

To define the populations, let x = (X_1, …, X_p)' be a multivariate uniformly distributed random vector such that X_j \sim U(a_j^{(k)}, b_j^{(k)}) is an independently distributed uniform random variable with a_j^{(k)} < b_j^{(k)} for j = 1, …, p.

For each population, we set the mean of the distribution along one feature to Δ, while the remaining features have mean 0. The objective is to have unit hypercubes with p = K_0 where the population centroids separate from each other in orthogonal directions as Δ increases, with no overlap for Δ ≥ 1.

Hence, let (a_k^{k}, b_k^{(k)}) = c(Δ - 1/2, Δ + 1/2). For the remaining ordered pairs, let (a_j^{(k)}, b_j^{(k)}) = (-1/2, 1/2).

We generate n_k observations from kth population.

For Δ = 0, the K_0 = 5 populations are equal.

Notice that the support of each population is a unit hypercube with p = K_0 features. Moreover, for Δ ≥ 1, the populations are mutually exclusive and entirely separated.

Value

named list containing:

x:

A matrix whose rows are the observations generated and whose columns are the p features (variables)

y:

A vector denoting the population from which the observation in each row was generated.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
data_generated <- simdata_uniform(seed = 42)
dim(data_generated$x)
table(data_generated$y)

data_generated2 <- simdata_uniform(n = 10 * seq_len(5), delta = 1.5)
table(data_generated2$y)
sample_means <- with(data_generated2,
                     tapply(seq_along(y), y, function(i) {
                            colMeans(x[i,])
                     }))
(sample_means <- do.call(rbind, sample_means))

sortinghat documentation built on May 30, 2017, 4:52 a.m.