generateSimulationDataset: Generate simulation dataset

View source: R/generateSimulationDataset.R

generateSimulationDatasetR Documentation

Generate simulation dataset

Description

Generates a dataset based upon a mixture of $K$ Gaussian distributions with $p$ independent, relevant features and $p_n$ irrelevant features.

Usage

generateSimulationDataset(
  K,
  n,
  p,
  delta_mu = 1,
  cluster_sd = 1,
  pi = rep(1/K, K),
  p_n = 0
)

Arguments

K

The number of components to sample from.

n

The number of samples to draw.

p

The number of relevant (i.e. signal-bearing) features.

delta_mu

The difference between the means defining each component within each feature (defaults to 1).

cluster_sd

The standerd deviation of the Gaussian distributions.

pi

The K-vector of the populations proportions across each component.

p_n

The number of irrelevant features (defaults to 0).

Value

A list of 'data' (a data.frame of the generated data) and 'cluster_IDs' (a vector of the cluster membership of each item).


stcolema/mdiHelpR documentation built on July 28, 2024, 5:41 a.m.