We generate *n_m* observations *(m = 1, …,
M)* from each of *M* multivariate Student's t
distributions such that the Euclidean distance between
each of the means and the origin is equal and scaled by
*Δ ≥ 0*.

1 2 |

`n` |
a vector (of length M) of the sample sizes for each population |

`p` |
the dimension of the multivariate Student's t distributions |

`df` |
a vector (of length M) of the degrees of freedom for each population |

`delta` |
the fixed distance between each population and the origin |

`Sigma` |
the common covariance matrix |

`seed` |
seed for random number generation (If NULL, does not set seed) |

Let *Π_m* denote the *m*th population with a
*p*-dimensional multivariate Student's t
distribution, *T_p(μ_m, Σ_m, c_m)*, where
*μ_m* is the population location vector,
*Σ_m* is the positive-definite covariance
matrix, and *c_m* is the degrees of freedom.

Let *e_m* be the *m*th standard basis vector
(i.e., the *m*th element is 1 and the remaining
values are 0). Then, we define

*μ_m = Δ
∑_{j=1}^{p/M} e_{(p/M)(m-1) + j}.*

Note that `p`

must be divisible by `M`

. By default, the first 10
dimensions of *μ_1* are set to `delta`

with
all remaining dimensions set to 0, the second 10
dimensions of *μ_2* are set to `delta`

with
all remaining dimensions set to 0, and so on.

We use a common covariance matrix *Σ_m = Σ*
for all populations.

For small values of *c_m*, the tails are heavier,
and, therefore, the average number of outlying
observations is increased.

By default, we let *M = 5*, *Δ = 0*,
*Σ_m = I_p*, and *c_m = 6*, *m = 1,
…, M*, where *I_p* denotes the *p \times p*
identity matrix. Furthermore, we generate 25 observations
from each population by default.

For *Δ = 0* and *c_m = c*, *m = 1,
…, M*, the *M* populations are equal.

named list containing:

- x:
A matrix whose rows are the observations generated and whose columns are the

`p`

features (variables)- y:
A vector denoting the population from which the observation in each row was generated.

1 2 3 4 5 6 7 8 9 10 11 | ```
data_generated <- sim_student(n = 10 * seq_len(5), seed = 42)
dim(data_generated$x)
table(data_generated$y)
data_generated2 <- sim_student(p = 10, delta = 2, df = rep(2, 5))
table(data_generated2$y)
sample_means <- with(data_generated2,
tapply(seq_along(y), y, function(i) {
colMeans(x[i,])
}))
(sample_means <- do.call(rbind, sample_means))
``` |

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.

All documentation is copyright its authors; we didn't write any of that.