sim_normal: Generates random variates from multivariate normal...

Description Usage Arguments Details Value Examples

View source: R/sim_normal.r

Description

We generate n_m observations (m = 1, …, M) from each of M multivariate normal distributions such that the Euclidean distance between each of the means and the origin is equal and scaled by Δ ≥ 0.

Usage

1
2
  sim_normal(n = rep(25, 5), p = 50, rho = rep(0.9, 5),
    delta = 0, sigma2 = 1, seed = NULL)

Arguments

n

a vector (of length M) of the sample sizes for each population

p

the dimension of the multivariate normal populations

rho

a vector (of length M) of the intraclass constants for each population

delta

the fixed distance between each population and the origin

sigma2

the coefficient of each intraclass covariance matrix

seed

seed for random number generation (If NULL, does not set seed)

Details

Let Π_m denote the mth population with a p-dimensional multivariate normal distribution, N_p(μ_m, Σ_m) with mean vector μ_m and covariance matrix Σ_m. Also, let e_m be the mth standard basis vector (i.e., the mth element is 1 and the remaining values are 0). Then, we define

μ_m = Δ ∑_{j=1}^{p/M} e_{(p/M)(m-1) + j}.

Note that p must be divisible by M. By default, the first 10 dimensions of μ_1 are set to delta with all remaining dimensions set to 0, the second 10 dimensions of μ_2 are set to delta with all remaining dimensions set to 0, and so on.

Also, we consider intraclass covariance (correlation) matrices such that Σ_m = σ^2 (1 - ρ_m) J_p + ρ_m I_p, where -(p-1)^{-1} < ρ_m < 1, I_p is the p \times p identity matrix, and J_p denotes the p \times p matrix of ones.

By default, we let M = 5, Δ = 0, and σ^2 = 1. Furthermore, we generate 25 observations from each population by default.

For Δ = 0 and ρ_m = ρ, m = 1, …, M, the M populations are equal.

Value

named list containing:

x:

A matrix whose rows are the observations generated and whose columns are the p features (variables)

y:

A vector denoting the population from which the observation in each row was generated.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
data_generated <- sim_normal(n = 10 * seq_len(5), seed = 42)
dim(data_generated$x)
table(data_generated$y)

data_generated2 <- sim_normal(p = 10, delta = 2, rho = rep(0.1, 5))
table(data_generated2$y)
sample_means <- with(data_generated2,
                     tapply(seq_along(y), y, function(i) {
                            colMeans(x[i,])
                     }))
(sample_means <- do.call(rbind, sample_means))

clusteval documentation built on May 2, 2019, 9:18 a.m.