generate_imbalanced_data: Generate an imbalanced data set.

Description Usage Arguments Details Value

Description

generate_imbalanced_data is a simple function to generate a two-class imbalanced data set.

Usage

1
2
generate_imbalanced_data(num_examples = 100L, num_features = 2L,
  imbalance_ratio = 5, noise_maj = 0.05, noise_min = 0.1, seed = NULL)

Arguments

num_examples

Total number of examples in the data set.

num_features

Total number of features in the data set.

imbalance_ratio

Ratio of the number of examples in the majority class to the number of examples in the minority class.

noise_maj

Fraction of the minority class that is mislabelled as majority class.

noise_min

Fraction of the majority class that is mislabelled as minority class.

seed

Integer value for reproducibility purposes.

Details

The imbalanced data set generated has two classes where the majority class comes from a multivariate normal distribution with mean zero and unitary standard deviation for all features and the minority class comes from a multivariate normal distribution with mean two and unitary standard deviation for all features.

The total number of examples and the dimensionality of the data are chosen through the num_examples and num_features arguments. The imbalance_ratio argument together with num_examples determines the exact number of examples in the majority and minority classes. To simulate noise in the data, approximately noise_min examples in the majority class are labelled as minority class examples and approximately noise_maj examples in the minority class are labelled as majority class examples. noise_maj and noise_min are fractions.

Value

A data frame containing an imbalanced two-class data set.


RomeroBarata/bimba documentation built on May 17, 2019, 8:03 a.m.