ADASYN: The ADASYN algorithm.

Description Usage Arguments Details Value References

Description

ADASYN over-samples the input data using the Adaptive Synthetic Sampling algorithm.

Usage

1
ADASYN(data, perc_min = 50, perc_over = NULL, k = 5, classes = NULL)

Arguments

data

A data frame containing the predictors and the outcome. The predictors must be numeric and the outcome must be both a binary valued factor and the last column of data.

perc_min

The desired % size of the minority class relative to the whole data set. For instance, if perc_min = 50 the returned data set is balanced. perc_min is ignored if perc_over is specified.

perc_over

% of examples to append to the input data set relative to the size of the minority class. For instance, if perc_over = 100 the minority class doubles in size. If specified, perc_min is ignored.

k

Number of nearest neighbours to compute for each example in the minority class.

classes

A named vector identifying the majority and the minority classes. The names must be "Majority" and "Minority". This argument is only useful if the function is called inside another sampling function.

Details

ADASYN is an adaptation of the SMOTE algorithm which focuses on synthesising more examples for the minority examples that are considered "hard" to learn. The learning hardness of a minority example is defined as being proportional to the number of majority examples among the k nearest neighbours of the minority example. There are two cases where no examples are synthesised for a minority example. The first case is when all k nearest neighbours belong to the majority class and the minority examples is considered to be noise. The second case is when all k nearest neighbours belong to the minority class and the minority example is considered too easy to learn (learning hardness = 0).

Compared to ADASYN's original description, the current implementation has a few differences. Firstly, the d_{th} parameter was dropped. Secondly, the β parameter was replaced by perc_min and perc_over parameters. The modification allows the user to synthesise as many examples as wanted and β = 1 is equivalent to perc_min = 50 (balance the distribution of examples).

Value

A data frame containing a more balanced version of the input data set after over-sampling it with ADASYN.

References

He, H., Bai, Y., Garcia, E. A., & Li, S. (2008, June). ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In Neural Networks, 2008. IJCNN 2008.(IEEE World Congress on Computational Intelligence). IEEE International Joint Conference on (pp. 1322-1328). IEEE.


RomeroBarata/bimba documentation built on May 17, 2019, 8:03 a.m.