resample_imbalanced: Resampling imbalanced data for classification problems

View source: R/deepDummy.r

resample_imbalancedR Documentation

Resampling imbalanced data for classification problems

Description

resample_imbalanced resamples an imbalanced data set to get a balanced data set.

Usage

resample_imbalanced(
  dataset,
  x,
  y,
  n = 1L,
  k = 1L,
  type = c("oversampling", "undersampling", "smote")
)

Arguments

dataset

An imbalanced data set, usually a data frame.

x

The names or indices of the feature columns within dataset.

y

The names or indices of the target columns with class labels (categories) within dataset.

n

The number of newly created samples or the percentage of deleted samples.

k

The number of nearest neighbors, only relevant for type smote.

type

The technique to be used for creating a balanced data set.
oversampling: copy n rows of minority class (under-represented category)
undersampling: delete n% rows of majority class (over-represented category)
smote: Synthetic Minority Oversampling Technique (SMOTE): create n synthetic rows of minority class of k nearest neighbors

Value

A balanced data set.

References

Chawla, Nitesh V., Bowyer, Kevin W., Hall, Lawrence O., Kegelmeyer, W. Philip (2002): SMOTE: Synthetic Minority Over-sampling Technique. In: Journal of Artificial Intelligence Research, 16 (2002), 321-357. https://doi.org/10.1613/jair.953; https://www.cs.cmu.edu/afs/cs/project/jair/pub/volume16/chawla02a-html/chawla2002.html, http://rikunert.com/SMOTE_explained.

See Also

Other Dummifying: append_rows(), dummify(), dummify_multilabel(), effectcoding(), one_hot_decode(), one_hot_encode(), remove_columns(), sparse_encode()


stschn/deepANN documentation built on June 25, 2024, 7:27 a.m.