simdata: Wrapper function to generate data from a variety of...

Description Usage Arguments Details Value Examples

View source: R/simdata-wrapper.r

Description

We provide a wrapper function to generate random variates from any of the following data-generating families:

simdata_normal:

Multivariate normal

simdata_t:

Multivariate Student's t

simdata_uniform:

Multivariate uniform

simdata_contaminated:

Multivariate contaminated normal

simdata_guo:

Simulation configuration from Guo et al. (2007)

simdata_friedman:

Six simulation configurations from Friedman (1989)

Usage

1
2
  simdata(family = c("uniform", "normal", "t", "contaminated", "guo", "friedman"),
    ...)

Arguments

family

the family of distributions from which to generate data

...

optional arguments that are passed to the data-generating function

Details

This wrapper function is useful for simulation studies, where the performance of supervised and unsupervised learning methods and algorithms are evaluated. For each data-generating model, we generate n_k observations (k = 1, …, K) from each of K multivariate distributions.

Each family returns a list containing a matrix of the multivariate observations generated as well as the class labels for each observation.

For details about an individual data-generating family, please see its respective documentation.

Value

named list containing:

x:

A matrix whose rows are the observations generated and whose columns are the p features (variables)

y:

A vector denoting the population from which the observation in each row was generated.

Examples

1
2
3
data_normal <- simdata(family = "normal", n = c(10, 20), mean = c(0, 1), cov = diag(2), seed = 42)
data_uniform <- simdata(family = "uniform", delta = 2, seed = 42)
data_friedman <- simdata(family = "friedman", experiment = 4, seed = 42)

Example output



sortinghat documentation built on May 30, 2017, 4:52 a.m.