sim_normal: Generates random variates from multivariate normal...

Description Usage Arguments Details Value Examples

Description

We generate n_k observations (k = 1, …, K_0) from each of K_0 multivariate normal distributions such that the Euclidean distance between each of the means and the origin is equal and scaled by Δ ≥ 0.

Usage

1
2
sim_normal(n = rep(25, 5), p = 50, rho = rep(0.9, 5), delta = 0,
  sigma2 = 1, seed = NULL)

Arguments

n

a vector (of length K_0) of the sample sizes for each population

p

the dimension of the multivariate normal populations

rho

a vector (of length K_0) of the intraclass constants for each population

delta

the fixed distance between each population and the origin

sigma2

the coefficient of each intraclass covariance matrix

seed

seed for random number generation (If NULL, does not set seed)

Details

Let Π_k denote the kth population with a p-dimensional multivariate normal distribution, N_p(μ_k, Σ_k) with mean vector μ_k and covariance matrix Σ_k. Also, let e_k be the kth standard basis vector (i.e., the kth element is 1 and the remaining values are 0). Then, we define

μ_k = Δ ∑_{j=1}^{p/K_0} e_{(p/K_0)(k-1) + j}.

Note that p must be divisible by K_0. By default, the first 10 dimensions of μ_1 are set to Δ with all remaining dimensions set to 0, the second 10 dimensions of μ_2 are set to Δ with all remaining dimensions set to 0, and so on.

Also, we consider intraclass covariance (correlation) matrices such that Σ_k = σ^2 (1 - ρ_k) J_p + ρ_k I_p, where -(p-1)^{-1} < ρ_k < 1, I_p is the p \times p identity matrix, and J_p denotes the p \times p matrix of ones.

By default, we let K_0 = 5, Δ = 0, and σ^2 = 1. Furthermore, we generate 25 observations from each population by default.

For Δ = 0 and ρ_k = ρ, k = 1, …, K_0, the K_0 populations are equal.

Value

named list containing:

x:

A matrix whose rows are the observations generated and whose columns are the p features (variables)

y:

A vector denoting the population from which the observation in each row was generated.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
data_generated <- sim_normal(n = 10 * seq_len(5), seed = 42)
dim(data_generated$x)
table(data_generated$y)

data_generated2 <- sim_normal(p = 10, delta = 2, rho = rep(0.1, 5))
table(data_generated2$y)
sample_means <- with(data_generated2,
                     tapply(seq_along(y), y, function(i) {
                            colMeans(x[i,])
                     }))
(sample_means <- do.call(rbind, sample_means))

ramhiser/clusteval documentation built on May 26, 2019, 10:07 p.m.