README.md

ImbCoL

R package for data complexity measures for imbalanced classification tasks. These measures were adapted from Ho and Basu [1] and published on Barella et. al [2]. Mainly, this package provides a decomposition by class of the original measures which showed to be powerful describing imbalanced classification tasks [2]. The implementation is based on the package ECoL (https://github.com/lpfgarcia/ECoL) [3].

Measures

The decomposed data complexity measures can be grouped in: (1) feature overlapping measures, (2) neighborhood measures, (3) linearity measures. They are listed below:

Measures of overlapping

Measures of neighborhood information

Measures of linearity

Installation

This package is not available on CRAN but it can be installed with devtools.

if (!require("devtools")) {
    install.packages("devtools")
}
devtools::install_github("victorhb/ImbCoL")
library("ImbCoL")

Example of use

The simplest way to compute the complexity measures is using the complexity method. It is possible to use a formula as parameter or a data.frame. To extract a specific measure, use the function related with the group. A simple example is given next:

## Extract all complexity measures available
ImbCoL::complexity(Species ~ ., iris)

## Extract all complexity measures using data frame
ImbCoL::complexity(iris[,1:4], iris[,5])

## Extract the overlapping measures
ImbCoL::overlapping(Species ~ ., iris)

## Extract the decomposed N3 measure using neighborhood function
ImbCoL::neighborhood(Species ~ ., iris, measures="N3_partial")

Developer notes

The implementation of ImbCoL is based on the implementation of ECoL. We suggest using the namespace ImbCoL:: when using both packages to avoid conflict.

To cite ImbCoL in publications use:

To submit bugs and feature requests, report at project issues.

References

[1] Ho, T., and Basu, M. (2002). Complexity measures of supervised classification problems. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(3):289-300.

[2] Barella, V. H., Garcia, L. P., de Souto, M. P., Lorena, A. C., and De Carvalho, A. (2018, July). Data Complexity Measures for Imbalanced Classification Tasks. In 2018 International Joint Conference on Neural Networks (IJCNN) (pp. 1-8). IEEE.

[3] Garcia, L., Lorena, A., Lehmann, and J. ECoL: Complexity Measures for Classification Problems, https://CRAN.R-project.org/package=ECoL, 2018

[4] R Core Team (2019). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.

[5] Barella, V. H., Garcia, L. P., de Souto, M. C., Lorena, A. C., & de Carvalho, A. C. (2021). Assessing the data complexity of imbalanced datasets. Information Sciences, 553, 83-109.



victorhb/ImbCoL documentation built on May 20, 2021, 12:18 p.m.