simdata_guo: Generates data from 'K' multivariate normal data populations...

Description Usage Arguments Details Value References Examples

Description

We generate n_k observations (k = 1, …, K) from each of K multivariate normal distributions. Let the kth population have a p-dimensional multivariate normal distribution, N_p(μ_k, Σ_k) with mean vector μ_k and positive-definite covariance matrix Σ_k. Each covariance matrix Σ_k consists of block-diagonal autocorrelation matrices.

Usage

1
2
  simdata_guo(n, mean, block_size, num_blocks, rho,
    sigma2 = 1, seed = NULL)

Arguments

n

a vector (of length K) of the sample sizes for each population

mean

a vector or a list (of length K) of mean vectors

block_size

a vector (of length K) of the sizes of the square block matrices for each population. See details.

num_blocks

a vector (of length K) giving the number of block matrices for each population. See details.

rho

a vector (of length K) of the values of the autocorrelation parameter for each class covariance matrix

sigma2

a vector (of length K) of the variance coefficients for each class covariance matrix

seed

seed for random number generation (If NULL, does not set seed)

Details

The kth class covariance matrix is defined as

Σ_k = Σ^{(ρ)} \oplus Σ^{(-ρ)} \oplus … \oplus Σ^{(ρ)},

where \oplus denotes the direct sum and the (i,j)th entry of Σ^{(ρ)} is

Σ_{ij}^{(ρ)} = \{ ρ^{|i - j|} \}.

The matrix Σ^{(ρ)} is referred to as a block. Its dimensions are provided in the block_size argument, and the number of blocks are specified in the num_blocks argument.

Each matrix Σ_k is generated by the cov_block_autocorrelation function.

The number of populations, K, is determined from the length of the vector of sample sizes, coden. The mean vectors can be given in a list of length K. If one mean is given (as a vector or a list having 1 element), then all populations share this common mean.

The block sizes can be given as a numeric vector or a single value, in which case the degrees of freedom is replicated K times. The same logic applies to num_blocks, rho, and sigma2.

For each class, the number of features, p, is computed as block_size * num_blocks. The values for p must agree for each class.

The block-diagonal covariance matrix with autocorrelated blocks was popularized by Guo et al. (2007) for studying classification of high-dimensional data.

Value

named list containing:

x:

A matrix whose rows are the observations generated and whose columns are the p features (variables)

y:

A vector denoting the population from which the observation in each row was generated.

References

Guo, Y., Hastie, T., & Tibshirani, R. (2007). "Regularized linear discriminant analysis and its application in microarrays," Biostatistics, 8, 1, 86-100.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# Generates 10 observations from two multivariate normal populations having
# a block-diagonal autocorrelation structure.
block_size <- 3
num_blocks <- 3
p <- block_size * num_blocks
means_list <- list(seq_len(p), -seq_len(p))
data <- simdata_guo(n = c(10, 10), mean = means_list, block_size = block_size,
                    num_blocks = num_blocks, rho = 0.9, seed = 42)
dim(data$x)
table(data$y)

# Generates 15 observations from each of three multivariate normal
# populations having block-diagonal autocorrelation structures. The
# covariance matrices are unequal.
p <- 16
block_size <- c(2, 4, 8)
num_blocks <- p / block_size
rho <- c(0.1, 0.5, 0.9)
sigma2 <- 1:3
mean_list <- list(rep.int(-5, p), rep.int(0, p), rep.int(5, p))

set.seed(42)
data2 <- simdata_guo(n = c(15, 15, 15), mean = mean_list,
                    block_size = block_size, num_blocks = num_blocks,
                    rho = rho, sigma2 = sigma2)
dim(data2$x)
table(data2$y)

ramhiser/sortinghat documentation built on May 26, 2019, 10:12 p.m.