Description Usage Arguments Details Value Examples
We generate n_m observations (m = 1, …, M) from each of M multivariate Student's t distributions such that the Euclidean distance between each of the means and the origin is equal and scaled by Δ ≥ 0.
1 2 |
n |
a vector (of length M) of the sample sizes for each population |
p |
the dimension of the multivariate Student's t distributions |
df |
a vector (of length M) of the degrees of freedom for each population |
delta |
the fixed distance between each population and the origin |
Sigma |
the common covariance matrix |
seed |
seed for random number generation (If NULL, does not set seed) |
Let Π_m denote the mth population with a p-dimensional multivariate Student's t distribution, T_p(μ_m, Σ_m, c_m), where μ_m is the population location vector, Σ_m is the positive-definite covariance matrix, and c_m is the degrees of freedom.
Let e_m be the mth standard basis vector (i.e., the mth element is 1 and the remaining values are 0). Then, we define
μ_m = Δ ∑_{j=1}^{p/M} e_{(p/M)(m-1) + j}.
Note that p
must be divisible by M
. By default, the first 10
dimensions of μ_1 are set to delta
with
all remaining dimensions set to 0, the second 10
dimensions of μ_2 are set to delta
with
all remaining dimensions set to 0, and so on.
We use a common covariance matrix Σ_m = Σ for all populations.
For small values of c_m, the tails are heavier, and, therefore, the average number of outlying observations is increased.
By default, we let M = 5, Δ = 0, Σ_m = I_p, and c_m = 6, m = 1, …, M, where I_p denotes the p \times p identity matrix. Furthermore, we generate 25 observations from each population by default.
For Δ = 0 and c_m = c, m = 1, …, M, the M populations are equal.
named list containing:
A matrix
whose rows are the observations generated and whose
columns are the p
features (variables)
A vector denoting the population from which the observation in each row was generated.
1 2 3 4 5 6 7 8 9 10 11 | data_generated <- sim_student(n = 10 * seq_len(5), seed = 42)
dim(data_generated$x)
table(data_generated$y)
data_generated2 <- sim_student(p = 10, delta = 2, df = rep(2, 5))
table(data_generated2$y)
sample_means <- with(data_generated2,
tapply(seq_along(y), y, function(i) {
colMeans(x[i,])
}))
(sample_means <- do.call(rbind, sample_means))
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.