generate_imbalanced_data: Generate an imbalanced data set.
In RomeroBarata/bimba: Sampling Algorithms for Two-Class Imbalanced Data Sets

Description Usage Arguments Details Value

generate_imbalanced_data is a simple function to generate a two-class imbalanced data set.

1 2	generate_imbalanced_data(num_examples = 100L, num_features = 2L, imbalance_ratio = 5, noise_maj = 0.05, noise_min = 0.1, seed = NULL)

`num_examples`	Total number of examples in the data set.
`num_features`	Total number of features in the data set.
`imbalance_ratio`	Ratio of the number of examples in the majority class to the number of examples in the minority class.
`noise_maj`	Fraction of the minority class that is mislabelled as majority class.
`noise_min`	Fraction of the majority class that is mislabelled as minority class.
`seed`	Integer value for reproducibility purposes.

The imbalanced data set generated has two classes where the majority class comes from a multivariate normal distribution with mean zero and unitary standard deviation for all features and the minority class comes from a multivariate normal distribution with mean two and unitary standard deviation for all features.

The total number of examples and the dimensionality of the data are chosen through the num_examples and num_features arguments. The imbalance_ratio argument together with num_examples determines the exact number of examples in the majority and minority classes. To simulate noise in the data, approximately noise_min examples in the majority class are labelled as minority class examples and approximately noise_maj examples in the minority class are labelled as majority class examples. noise_maj and noise_min are fractions.

A data frame containing an imbalanced two-class data set.

RomeroBarata/bimba documentation built on May 17, 2019, 8:03 a.m.