AdasynClassif: ADASYN algorithm for unbalanced classification problems, both...
In paobranco/UBL: An Implementation of Re-Sampling Approaches to Utility-Based Learning for Both Classification and Regression Tasks

AdasynClassif

R Documentation

ADASYN algorithm for unbalanced classification problems, both binary and multi-class.

Description

This function handles unbalanced classification problems using the ADASYN algorithm. This algorithm generates synthetic cases using a SMOTE-like approache. However, the examples of the class(es) where over-sampling is applied are weighted according to their level of difficulty in learning. This means that more synthetic data is generated for cases which are hrder to learn compared to the examples of the same class that are easier to learn. This implementation provides a strategy suitable for both binary and multi-class problems.

Usage

AdasynClassif(form, dat, baseClass = NULL, beta = 1, dth = 0.95,
                          k = 5, dist = "Euclidean", p = 2)

Arguments

`form`	A formula describing the prediction problem
`dat`	A data frame containing the original (unbalanced) data set
`baseClass`	Character specifying the reference class, i.e., the class from which all other classes will be compared to. This can be selected by the user or estimated from the classes distribution. If not defined (the default) the majority class is selected.
`beta`	Either a numeric value indicating the desired balance level after synthetic examples generation, or a named list specifying the selected classes beta value. A beta value of 1 (the default) corresponds to full balancing the classes. See examples.
`dth`	A threshold for the maximum tolerated degree of class imbalance ratio. Defaults to 0.95, meaning that the strategy is applied if the imbalance ratio is more than 5%. This means that the strategy is applied to a class A if, the ratio of this class frequency and the baseClass frequency is less than 95% (\|A\|/\|baseClass\| < 0.95).
`k`	A number indicating the number of nearest neighbors that are used to generate the new examples of the minority class(es).
`dist`	A character string indicating which distance metric to use when determining the k nearest neighbors. See the details. Defaults to "Euclidean".
`p`	A number indicating the value of p if the "p-norm" distance is chosen. Only necessary to define if a "p-norm" is chosen in the `dist` argument. See details.

Details

dist parameter:

The parameter dist allows the user to define the distance metric to be used in the neighbors computation. Although the default is the Euclidean distance, other metrics are available. This allows the computation of distances in data sets with, for instance, both nominal and numeric features. The options available for the distance functions are as follows:

- for data with only numeric features: "Manhattan", "Euclidean", "Canberra", "Chebyshev", "p-norm";

- for data with only nominal features: "Overlap";

- for dealing with both nominal and numeric features: "HEOM", "HVDM".

When the "p-norm" is selected for the dist parameter, it is also necessary to define the value of parameter p. The value of parameter p sets which "p-norm" will be used. For instance, if p is set to 1, the "1-norm" (or Manhattan distance) is used, and if p is set to 2, the "2-norm" (or Euclidean distance) is applied. For more details regarding the distance functions implemented in UBL package please see the package vignettes.

Value

The function returns a data frame with the new data set resulting from the application of ADASYN algorithm.

Author(s)

Paula Branco paobranco@gmail.com, Rita Ribeiro rpribeiro@dcc.fc.up.pt and Luis Torgo ltorgo@dcc.fc.up.pt

References

He, H., Bai, Y., Garcia, E.A. and Li, S., 2008, June. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence) (pp. 1322-1328). IEEE.

Examples

# Example with an imbalanced multi-class problem
 data(iris)
 dat <- iris[-c(45:75), c(1, 2, 5)]
# checking the class distribution of this artificial data set
 table(dat$Species)
 newdata <- AdasynClassif(Species~., dat, beta=1)
 table(newdata$Species)
 beta <- list("setosa"=1, "versicolor"=0.5)
 newdata <- AdasynClassif(Species~., dat, baseClass="virginica", beta=beta)
 table(newdata$Species)

## Checking visually the created data
par(mfrow = c(1, 2))
plot(dat[, 1], dat[, 2], pch = 19 + as.integer(dat[, 3]),
     col = as.integer(dat[,3]), main = "Original Data",
     xlim=range(newdata[,1]), ylim=range(newdata[,2]))
plot(newdata[, 1], newdata[, 2], pch = 19 + as.integer(newdata[, 3]),
     col = as.integer(newdata[,3]), main = "New Data",
     xlim=range(newdata[,1]), ylim=range(newdata[,2]))

# A binary example
library(MASS)
data(cats)
table(cats$Sex)
Ada1cats <- AdasynClassif(Sex~., cats)
table(Ada1cats$Sex)
Ada2cats <- AdasynClassif(Sex~., cats, beta=5)
table(Ada2cats$Sex)

paobranco/UBL documentation built on Dec. 12, 2024, 9:21 p.m.

paobranco/UBL index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

paobranco/UBL
An Implementation of Re-Sampling Approaches to Utility-Based Learning for Both Classification and Regression Tasks

AdasynClassif: ADASYN algorithm for unbalanced classification problems, both...
In paobranco/UBL: An Implementation of Re-Sampling Approaches to Utility-Based Learning for Both Classification and Regression Tasks

ADASYN algorithm for unbalanced classification problems, both binary and multi-class.

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Related to AdasynClassif in paobranco/UBL...

R Package Documentation

Browse R Packages

We want your feedback!

paobranco/UBL An Implementation of Re-Sampling Approaches to Utility-Based Learning for Both Classification and Regression Tasks

AdasynClassif: ADASYN algorithm for unbalanced classification problems, both... In paobranco/UBL: An Implementation of Re-Sampling Approaches to Utility-Based Learning for Both Classification and Regression Tasks

ADASYN algorithm for unbalanced classification problems, both binary and multi-class.

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Related to AdasynClassif in paobranco/UBL...

R Package Documentation

Browse R Packages

We want your feedback!

paobranco/UBL
An Implementation of Re-Sampling Approaches to Utility-Based Learning for Both Classification and Regression Tasks

AdasynClassif: ADASYN algorithm for unbalanced classification problems, both...
In paobranco/UBL: An Implementation of Re-Sampling Approaches to Utility-Based Learning for Both Classification and Regression Tasks