Wrapper function to generate data from a variety of data-generating models.

Share:

Description

We provide a wrapper function to generate from three data-generating models:

sim_unif

Five multivariate uniform distributions

sim_normal

Multivariate normal distributions with intraclass covariance matrices

sim_student

Multivariate Student's t distributions each with a common covariance matrix

Usage

1
  sim_data(family = c("uniform", "normal", "student"), ...)

Arguments

family

the family of distributions from which to generate data

...

optional arguments that are passed to the data-generating function

Details

For each data-generating model, we generate n_m observations (m = 1, …, M) from each of M multivariate distributions so that the Euclidean distance between each of the population centroids and the origin is equal and scaled by Δ ≥ 0. For each model, the argument delta controls this separation.

This wrapper function is useful for simulation studies, where the efficacy of supervised and unsupervised learning methods and algorithms are evaluated as a the population separation is increased.

Value

named list containing:

x:

A matrix whose rows are the observations generated and whose columns are the p features (variables)

y:

A vector denoting the population from which the observation in each row was generated.

Examples

1
2
3
4
set.seed(42)
uniform_data <- sim_data(family = "uniform")
normal_data <- sim_data(family = "normal", delta = 2)
student_data <- sim_data(family = "student", delta = 1, df = 1:5)