Generates data from K multivariate normal data populations, where each population (class) has an intraclass covariance matrix.
This function generates
K multivariate normal data sets, where each
class is generated with a constant mean vector and an intraclass covariance
matrix. The data are returned as a single matrix
x along with a vector
of class labels
y that indicates class membership.
vector of the sample sizes of each class. The length of
the number of features (variables) in the data
vector of the values of the off-diagonal elements for each
intraclass covariance matrix. Must equal the length of
vector containing the mean for each class. Must equal the length of
vector of variances for each class. Must equal the length of
For simplicity, we assume that a class mean vector is constant for each feature. That is, we assume that the mean vector of the kth class is c_k * j_p, where j_p is a p \times 1 vector of ones and c_k is a real scalar.
The intraclass covariance matrix for the kth class is defined as:
σ_k^2 * (ρ_k * J_p + (1 - ρ_k) * I_p),
where J_p is the p \times p matrix of ones and I_p is the p \times p identity matrix.
By default, with sigma_k^2 = 1, the diagonal elements of the intraclass
covariance matrix are all 1, while the off-diagonal elements of the matrix
The values of
rho must be between (1 - p)^(-1) and 1,
exclusively, to ensure that the covariance matrix is positive definite.
The number of classes
K is determined with lazy evaluation as the
named list with elements:
x: matrix of observations with
y: vector of class labels that indicates class membership for each observation (row) in
1 2 3 4 5 6 7 8 9 10 11
# Generates data from K = 3 classes. data <- generate_intraclass(n = 3:5, p = 5, rho = seq(.1, .9, length = 3), mu = c(0, 3, -2)) data$x data$y # Generates data from K = 4 classes. Notice that we use specify a variance. data <- generate_intraclass(n = 3:6, p = 4, rho = seq(0, .9, length = 4), mu = c(0, 3, -2, 6), sigma2 = 1:4) data$x data$y