simdata_t: Generates random variates from K multivariate Student's t...

Description Usage Arguments Details Examples

View source: R/simdata-t.r

Description

We generate n_k observations (k = 1, …, K_0) from each of K_0 multivariate Student's t distributions such that the Euclidean distance between each of the means and the origin is equal and scaled by Δ ≥ 0.

Usage

1
  simdata_t(n, centroid, cov, df, seed = NULL)

Arguments

n

a vector (of length K) of the sample sizes for each population

centroid

a vector or a list (of length K) of centroid vectors

cov

a symmetric matrix or a list (of length K) of symmetric covariance matrices.

df

a vector (of length K) of the degrees of freedom for each population

seed

seed for random number generation (If NULL, does not set seed)

x:

A matrix whose rows are the observations generated and whose columns are the p features (variables)

y:

A vector denoting the population from which the observation in each row was generated.

Details

Let Π_k denote the kth population with a p-dimensional multivariate Student's t distribution, T_p(μ_k, Σ_k, c_k), where μ_k is the population location vector, Σ_k is the positive-definite covariance matrix, and c_k is the degrees of freedom.

For small values of c_k, the tails are heavier, and, therefore, the average number of outlying observations is increased.

The number of populations, K, is determined from the length of the vector of sample sizes, coden. The centroid vectors and covariance matrices each can be given in a list of length K. If one covariance matrix is given (as a matrix or a list having 1 element), then all populations share this common covariance matrix. The same logic applies to population centroids. The degrees of freedom can be given as a numeric vector or a single value, in which case the degrees of freedom is replicated K times.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
# Generates 10 observations from each of two multivariate t populations
# with equal covariance matrices and equal degrees of freedom.
centroid_list <- list(c(3, 0), c(0, 3))
cov_identity <- diag(2)
data_generated <- simdata_t(n = c(10, 10), centroid = centroid_list,
                            cov = cov_identity, df = 4, seed = 42)
dim(data_generated$x)
table(data_generated$y)

# Generates 10 observations from each of three multivariate t populations
# with unequal covariance matrices and unequal degrees of freedom.
set.seed(42)
centroid_list <- list(c(-3, -3), c(0, 0), c(3, 3))
cov_list <- list(cov_identity, 2 * cov_identity, 3 * cov_identity)
data_generated2 <- simdata_t(n = c(10, 10, 10), centroid = centroid_list,
                             cov = cov_list, df = c(4, 6, 10))
dim(data_generated2$x)
table(data_generated2$y)

sortinghat documentation built on May 30, 2017, 4:52 a.m.