Generates data from K multivariate normal data populations, where each population (class) has an intraclass covariance matrix.

Share:

Description

This function generates K multivariate normal data sets, where each class is generated with a constant mean vector and an intraclass covariance matrix. The data are returned as a single matrix x along with a vector of class labels y that indicates class membership.

Usage

1
generate_intraclass(n, p, rho, mu, sigma2 = rep(1, K))

Arguments

n

vector of the sample sizes of each class. The length of n determines the number of classes K.

p

the number of features (variables) in the data

rho

vector of the values of the off-diagonal elements for each intraclass covariance matrix. Must equal the length of n.

mu

vector containing the mean for each class. Must equal the length of n (i.e., equal to K).

sigma2

vector of variances for each class. Must equal the length of n. Default is 1 for each class.

Details

For simplicity, we assume that a class mean vector is constant for each feature. That is, we assume that the mean vector of the kth class is c_k * j_p, where j_p is a p \times 1 vector of ones and c_k is a real scalar.

The intraclass covariance matrix for the kth class is defined as:

σ_k^2 * (ρ_k * J_p + (1 - ρ_k) * I_p),

where J_p is the p \times p matrix of ones and I_p is the p \times p identity matrix.

By default, with sigma_k^2 = 1, the diagonal elements of the intraclass covariance matrix are all 1, while the off-diagonal elements of the matrix are all rho.

The values of rho must be between (1 - p)^(-1) and 1, exclusively, to ensure that the covariance matrix is positive definite.

The number of classes K is determined with lazy evaluation as the length of n.

Value

named list with elements:

  • x: matrix of observations with n rows and p columns

  • y: vector of class labels that indicates class membership for each observation (row) in x.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Generates data from K = 3 classes.
data <- generate_intraclass(n = 3:5, p = 5, rho = seq(.1, .9, length = 3),
                            mu = c(0, 3, -2))
data$x
data$y

# Generates data from K = 4 classes. Notice that we use specify a variance.
data <- generate_intraclass(n = 3:6, p = 4, rho = seq(0, .9, length = 4),
                            mu = c(0, 3, -2, 6), sigma2 = 1:4)
data$x
data$y

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.