R package for data complexity measures for imbalanced classification tasks. These measures were adapted from Ho and Basu [1] and published on Barella et. al [2]. Mainly, this package provides a decomposition by class of the original measures which showed to be powerful describing imbalanced classification tasks [2]. The implementation is based on the package ECoL (https://github.com/lpfgarcia/ECoL) [3].
The decomposed data complexity measures can be grouped in: (1) feature overlapping measures, (2) neighborhood measures, (3) linearity measures. They are listed below:
Measures of overlapping
Measures of neighborhood information
Measures of linearity
This package is not available on CRAN but it can be installed with devtools.
if (!require("devtools")) {
install.packages("devtools")
}
devtools::install_github("victorhb/ImbCoL")
library("ImbCoL")
The simplest way to compute the complexity measures is using the complexity method. It is possible to use a formula as parameter or a data.frame. To extract a specific measure, use the function related with the group. A simple example is given next:
## Extract all complexity measures available
ImbCoL::complexity(Species ~ ., iris)
## Extract all complexity measures using data frame
ImbCoL::complexity(iris[,1:4], iris[,5])
## Extract the overlapping measures
ImbCoL::overlapping(Species ~ ., iris)
## Extract the decomposed N3 measure using neighborhood function
ImbCoL::neighborhood(Species ~ ., iris, measures="N3_partial")
The implementation of ImbCoL is based on the implementation of ECoL. We suggest using the namespace ImbCoL:: when using both packages to avoid conflict.
To cite ImbCoL in publications use:
Barella, V. H., Garcia, L. P., de Souto, M. C., Lorena, A. C., & de Carvalho, A. C. (2021). Assessing the data complexity of imbalanced datasets. Information Sciences, 553, 83-109.
Barella, V. H., Garcia, L. P., de Souto, M. P., Lorena, A. C., & De Carvalho, A. (2018, July). Data Complexity Measures for Imbalanced Classification Tasks. In 2018 International Joint Conference on Neural Networks (IJCNN) (pp. 1-8). IEEE.
To submit bugs and feature requests, report at project issues.
[1] Ho, T., and Basu, M. (2002). Complexity measures of supervised classification problems. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(3):289-300.
[2] Barella, V. H., Garcia, L. P., de Souto, M. P., Lorena, A. C., and De Carvalho, A. (2018, July). Data Complexity Measures for Imbalanced Classification Tasks. In 2018 International Joint Conference on Neural Networks (IJCNN) (pp. 1-8). IEEE.
[3] Garcia, L., Lorena, A., Lehmann, and J. ECoL: Complexity Measures for Classification Problems, https://CRAN.R-project.org/package=ECoL, 2018
[4] R Core Team (2019). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
[5] Barella, V. H., Garcia, L. P., de Souto, M. C., Lorena, A. C., & de Carvalho, A. C. (2021). Assessing the data complexity of imbalanced datasets. Information Sciences, 553, 83-109.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.