Description Usage Arguments Details Value References Examples
View source: R/simdata-friedman.r
In a widely cite paper, Friedman (1989) described six simulation configurations to study classifiers. This function provides an interface to all six configurations.
1 2 | simdata_friedman(n = rep(15, K), p = 10, experiment = 1,
seed = NULL)
|
n |
a vector (of length 3) of the sample sizes for each population |
p |
number of features of the generated data |
experiment |
the experiment number from the RDA paper |
seed |
seed for random number generation (If
|
We generate n_k observations (k = 1, …, K) from each of K = 3 multivariate normal distributions. Let the kth population have a p-dimensional multivariate normal distribution, N_p(μ_k, Σ_k) with mean vector μ_k and positive-definite covariance matrix Σ_k. Each covariance matrix Σ_k consists of a covariance structure based on the experiment chosen.
Here, we provide a brief description of each of the six experimental configurations. For more information, see Friedman (1989). We use Friedman's original setup except we fix the number of observations rather than randomly choosing the number of class observations.
Define I_p as the p-dimensional identity matrix.
Experiment #1 – Equal, Spherical Covariance Matrices
Each of the three classes are generated from a population with covariance matrix, I_p. The population mean of the first class is the origin. The means of the other two classes are taken to be 3.0 in two orthogonal directions.
Experiment #2 – Unequal, Spherical Covariance Matrices
Let each of the classes have covariance matrix k *I_p, where k is the class number (1 ≤ k ≤ 3). Similar to experiment #1, the population mean of the first class is the origin; the means for classes 2 and 3 are shifted in orthogonal directions, class 2 by a distance of 3.0 and class 3 by a distance of 4.0.
Experiment #3 – Equal, Highly Ellipsoidal Covariance Matrices
The covariance matrices of all three classes are equal and highly ellipsoidal. The location differences between the classes are concentrated in the low-variance subspace. The jth eigenvalue (j = 1, …, p) of the common covariance matrices is
e_j = [9 (j - 1) / (p - 1) + 1]^2,
so that the ratio of the largest to smallest eigenvalues is 100.
The population mean of the first class is the origin. The mean vectors for the second and third classes are in terms of the population eigenvalues. The mean of the jth feature for class 2 is
μ_{2j} = 2.5 √{e_j / p} \frac{p - j}{p/2 - 1},
where e_j is the jth eigenvalue given above. The mean of the jth feature for class 3 is
μ_{3j} = (-1)^j μ_{2j}.
Experiment #4 – Equal, Highly Ellipsoidal Covariance Matrices
Similar to Experiment #3, the covariance matrices of all three classes are equal and highly ellipsoidal. However, in this experiment the location differences between the classes are concentrated in the high-variance subspace. The jth eigenvalue (j = 1, …, p) of the common covariance matrices is
[9 * (j - 1) / (p - 1) + 1]^2,
so the ratio of the largest to smallest eigenvalues is 100.
The population mean of the first class is the origin. The mean vectors for the second and third classes are in terms of the population eigenvalues. The mean of the jth feature for class 2 is
μ_{2j} = 2.5 √{e_j / p} \frac{j - 1}{p/2 - 1},
where e_j is the jth eigenvalue given above. The mean of the jth feature for class 3 is
μ_{3j} = (-1)^j μ_{2j}.
Experiment #5 – Unequal, Highly Ellipsoidal Covariance Matrices
In this experiment, the class covariance matrices are highly ellipsoidal and very unequal. The eigenvalues for the first class are given by
e_{1j} = [9 (j - 1) / (p - 1) + 1]^2,
so that the ratio of the largest to smallest eigenvalues is 100. The eigenvalues for the second class are
e_{2j} = [9 (p - j) / (p - 1) + 1]^2.
The eigenvalues for class 3 are given by
e_{3j} = \{9 [j - (p - 1) / 2] / (p - 1) \}^2.
For the first two classes, the ratio of the largest to the smallest eigenvalues is 100, but their high and low variance subspaces are complementary of each other. This ratio for the third class is (p + 1)^2. The third class has low variance in the subspace of intermediate variance for the first two classes, and high variance where they have their complementary high/low variances.
Each class' mean vector is the origin so that the class distributions differ only in their covariance matrices.
Experiment #6 – Unequal, Highly Ellipsoidal Covariance Matrices
This experiment uses the same covariance structures described for Experiment #5. The population means, however, are different. The mean vector of the first class is the origin. The mean of the jth feature for class 2 is
μ_{2j} = 14 / √{p},
and class 3's mean vector is defined such that
μ_{3j} = (-1)^j μ_{2j}.
named list containing:
A matrix
whose rows are the observations generated and whose
columns are the p
features (variables)
A vector denoting the population from which the observation in each row was generated.
Friedman, J. H. (1989), "Regularized Discriminant Analysis," Journal of American Statistical Association, 84, 405, 165-175.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | # Generates 10 observations from three multivariate normal populations having
# the covariance structure given in Friedman's (1989) fifth experiment.
sample_sizes <- c(10, 20, 30)
p <- 20
data <- simdata_friedman(n = sample_sizes, p = p, experiment = 5, seed = 42)
dim(data$x)
table(data$y)
# Generates 15 observations from each of three multivariate normal
# populations having the covariance structure given in Friedman's (1989)
# sixth experiment.
sample_sizes <- c(15, 15, 15)
p <- 20
set.seed(42)
data2 <- simdata_friedman(n = sample_sizes, p = p, experiment = 6)
dim(data2$x)
table(data2$y)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.