This function generates K
multivariate normal data sets, where each
class is generated with a constant mean vector and an intraclass covariance
matrix. The data are returned as a single matrix x
along with a vector
of class labels y
that indicates class membership.
1  generate_intraclass(n, p, rho, mu, sigma2 = rep(1, K))

n 
vector of the sample sizes of each class. The length of 
p 
the number of features (variables) in the data 
rho 
vector of the values of the offdiagonal elements for each
intraclass covariance matrix. Must equal the length of 
mu 
vector containing the mean for each class. Must equal the length of

sigma2 
vector of variances for each class. Must equal the length of

For simplicity, we assume that a class mean vector is constant for each feature. That is, we assume that the mean vector of the kth class is c_k * j_p, where j_p is a p \times 1 vector of ones and c_k is a real scalar.
The intraclass covariance matrix for the kth class is defined as:
σ_k^2 * (ρ_k * J_p + (1  ρ_k) * I_p),
where J_p is the p \times p matrix of ones and I_p is the p \times p identity matrix.
By default, with sigma_k^2 = 1, the diagonal elements of the intraclass
covariance matrix are all 1, while the offdiagonal elements of the matrix
are all rho
.
The values of rho
must be between (1  p)^(1) and 1,
exclusively, to ensure that the covariance matrix is positive definite.
The number of classes K
is determined with lazy evaluation as the
length of n
.
named list with elements:
x
: matrix of observations with n
rows and p
columns
y
: vector of class labels that indicates class membership for
each observation (row) in x
.
1 2 3 4 5 6 7 8 9 10 11  # Generates data from K = 3 classes.
data < generate_intraclass(n = 3:5, p = 5, rho = seq(.1, .9, length = 3),
mu = c(0, 3, 2))
data$x
data$y
# Generates data from K = 4 classes. Notice that we use specify a variance.
data < generate_intraclass(n = 3:6, p = 4, rho = seq(0, .9, length = 4),
mu = c(0, 3, 2, 6), sigma2 = 1:4)
data$x
data$y

