Description Usage Arguments Details Value Examples
We generate n_k observations (k = 1, …, K_0) from each of K_0 multivariate normal distributions such that the Euclidean distance between each of the means and the origin is equal and scaled by Δ ≥ 0.
1 2 | sim_normal(n = rep(25, 5), p = 50, rho = rep(0.9, 5), delta = 0,
sigma2 = 1, seed = NULL)
|
n |
a vector (of length K_0) of the sample sizes for each population |
p |
the dimension of the multivariate normal populations |
rho |
a vector (of length K_0) of the intraclass constants for each population |
delta |
the fixed distance between each population and the origin |
sigma2 |
the coefficient of each intraclass covariance matrix |
seed |
seed for random number generation (If |
Let Π_k denote the kth population with a p-dimensional multivariate normal distribution, N_p(μ_k, Σ_k) with mean vector μ_k and covariance matrix Σ_k. Also, let e_k be the kth standard basis vector (i.e., the kth element is 1 and the remaining values are 0). Then, we define
μ_k = Δ ∑_{j=1}^{p/K_0} e_{(p/K_0)(k-1) + j}.
Note that p must be divisible by K_0. By default, the first 10 dimensions of μ_1 are set to Δ with all remaining dimensions set to 0, the second 10 dimensions of μ_2 are set to Δ with all remaining dimensions set to 0, and so on.
Also, we consider intraclass covariance (correlation) matrices such that Σ_k = σ^2 (1 - ρ_k) J_p + ρ_k I_p, where -(p-1)^{-1} < ρ_k < 1, I_p is the p \times p identity matrix, and J_p denotes the p \times p matrix of ones.
By default, we let K_0 = 5, Δ = 0, and σ^2 = 1. Furthermore, we generate 25 observations from each population by default.
For Δ = 0 and ρ_k = ρ, k = 1, …, K_0, the K_0 populations are equal.
named list containing:
A matrix whose rows are the observations generated and whose
columns are the p
features (variables)
A vector denoting the population from which the observation in each row was generated.
1 2 3 4 5 6 7 8 9 10 11 | data_generated <- sim_normal(n = 10 * seq_len(5), seed = 42)
dim(data_generated$x)
table(data_generated$y)
data_generated2 <- sim_normal(p = 10, delta = 2, rho = rep(0.1, 5))
table(data_generated2$y)
sample_means <- with(data_generated2,
tapply(seq_along(y), y, function(i) {
colMeans(x[i,])
}))
(sample_means <- do.call(rbind, sample_means))
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.